1-800-BAD-CODE commited on
Commit
1d4fc79
1 Parent(s): affeb69

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md CHANGED
@@ -733,3 +733,84 @@ seg test report:
733
  ```
734
 
735
  </details>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
733
  ```
734
 
735
  </details>
736
+
737
+
738
+
739
+ # Acronyms, abbreviations, and bi-capitalized words
740
+
741
+ This section briefly demonstrates the models behavior when presented with the following:
742
+
743
+ 1. Acronyms: "NATO"
744
+ 2. Fake acronyms: "NHTG" in place of "NATO"
745
+ 3. Ambigous term which could be an acronym or proper noun: "Tuny"
746
+ 3. Bi-capitalized words: "McDavid"
747
+ 4. Intialisms: "p.m."
748
+
749
+ <details open>
750
+
751
+ <summary>Acronyms, etc. inputs</summary>
752
+
753
+ ```python
754
+ from typing import List
755
+
756
+ from punctuators.models import PunctCapSegModelONNX
757
+
758
+ m: PunctCapSegModelONNX = PunctCapSegModelONNX.from_pretrained(
759
+ "1-800-BAD-CODE/xlm-roberta_punctuation_fullstop_truecase"
760
+ )
761
+
762
+ input_texts = [
763
+ "the us is a nato member as a nato member the country enjoys security guarantees notably article 5",
764
+ "the us is a nhtg member as a nhtg member the country enjoys security guarantees notably article 5",
765
+ "the us is a tuny member as a tuny member the country enjoys security guarantees notably article 5",
766
+ "connor andrew mcdavid is a canadian professional ice hockey centre and captain of the edmonton oilers of the national hockey league the oilers selected him first overall in the 2015 nhl entry draft mcdavid spent his childhood playing ice hockey against older children",
767
+ "please rsvp for the party asap preferably before 8 pm tonight",
768
+ ]
769
+
770
+ results: List[List[str]] = m.infer(
771
+ texts=input_texts, apply_sbd=True,
772
+ )
773
+ for input_text, output_texts in zip(input_texts, results):
774
+ print(f"Input: {input_text}")
775
+ print(f"Outputs:")
776
+ for text in output_texts:
777
+ print(f"\t{text}")
778
+ print()
779
+
780
+ ```
781
+
782
+ </details>
783
+
784
+
785
+ <details open>
786
+
787
+ <summary>Expected output</summary>
788
+
789
+ ```python
790
+ Input: the us is a nato member as a nato member the country enjoys security guarantees notably article 5
791
+ Outputs:
792
+ The U.S. is a NATO member.
793
+ As a NATO member, the country enjoys security guarantees, notably Article 5.
794
+
795
+ Input: the us is a nhtg member as a nhtg member the country enjoys security guarantees notably article 5
796
+ Outputs:
797
+ The U.S. is a NHTG member.
798
+ As a NHTG member, the country enjoys security guarantees, notably Article 5.
799
+
800
+ Input: the us is a tuny member as a tuny member the country enjoys security guarantees notably article 5
801
+ Outputs:
802
+ The U.S. is a Tuny member.
803
+ As a Tuny member, the country enjoys security guarantees, notably Article 5.
804
+
805
+ Input: connor andrew mcdavid is a canadian professional ice hockey centre and captain of the edmonton oilers of the national hockey league the oilers selected him first overall in the 2015 nhl entry draft mcdavid spent his childhood playing ice hockey against older children
806
+ Outputs:
807
+ Connor Andrew McDavid is a Canadian professional ice hockey centre and captain of the Edmonton Oilers of the National Hockey League.
808
+ The Oilers selected him first overall in the 2015 NHL entry draft.
809
+ McDavid spent his childhood playing ice hockey against older children.
810
+
811
+ Input: please rsvp for the party asap preferably before 8 pm tonight
812
+ Outputs:
813
+ Please RSVP for the party ASAP, preferably before 8 p.m. tonight.
814
+ ```
815
+
816
+ </details>