gyenist commited on
Commit
60cc224
1 Parent(s): 41686cb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -11
README.md CHANGED
@@ -38,7 +38,7 @@ This model can restore punctuation and auto-capitalize lower cased Hungarian tex
38
  ### Model Description
39
 
40
  I aim to fill the gap between Speech Recognition (speech2text) and downstream NLP tasks by developing a model for Automatic Punctuation Restoration (APR) in Hungarian called ‘hupunct’, that has raw unpunctuated lower-cased text as its input, and has the corrected, punctuated text as its output. The solution is based on a widely used NLP technique, which involves the finetuning of a pretrained special deep neural network, a Transformer.
41
- The hupunct model, after training for less than one epoch on the dataset generated from the Hungarian Web Corpus reached a test micro average F1-score of 87.2% and macro average F1-score of 74,1%. The CDQ macro F1-score achieved was 83.7%. This surpasses the current state-of-the art Hungarian model, although on a different but arguably harder dataset, even with using only one prediction per token. The model learned to restore punctuations belonging to the additional base punctuation classes and all the upper versions of those classes to a reasonable extent. Additionally, it can also auto-capitalize, which is a convenient feature. See some examples showing the model capabilities in Text Box 4 and Text Box 5 of the Appendix. The finetuning of huBERT for the APR task in Hungarian proved to be a powerful and very practical approach, especially with the usage of the HF platform.
42
 
43
  ### Examples
44
 
@@ -49,13 +49,4 @@ Output:
49
  'Gerendai Páltól a következőt idézzük: Gyermekkorom óta szeretem a Balatont. A balatoni tájak mindig is lenyűgöztek, és néha-néha, mikor a Balaton partján sétálok, szívemet elönti a szeretet. Hogyan lehet valami ilyen szép? A következő vendégünk Hambuch Kevin, a Balatonfenyvesi Egyetem doktora, a Knorr-Bremse kutatás-fejlesztésért felelős vezetője. Kevin ilyen-olyan projektekben vett részt a Mta-val közösen, majd 1999-ben alapítottak barátjával, Csisztapusztai Arnolddal egy céget, megpedíg a Gránit Kft-t. Ezután kezdte meg tevékenységét a német cégnél, ahol a Gránit Kft-ben szerzett tapasztalatát kamatoztatja.'
50
 
51
 
52
- - **Developed by:** [More Information Needed]
53
- - **Shared by [optional]:** [More Information Needed]
54
- - **Model type:** [More Information Needed]
55
- - **Language(s) (NLP):** [More Information Needed]
56
- - **License:** [More Information Needed]
57
- - **Finetuned from model [optional]:** [More Information Needed]
58
-
59
- ### Model Architecture and Objective
60
-
61
- [More Information Needed]
 
38
  ### Model Description
39
 
40
  I aim to fill the gap between Speech Recognition (speech2text) and downstream NLP tasks by developing a model for Automatic Punctuation Restoration (APR) in Hungarian called ‘hupunct’, that has raw unpunctuated lower-cased text as its input, and has the corrected, punctuated text as its output. The solution is based on a widely used NLP technique, which involves the finetuning of a pretrained special deep neural network, a Transformer.
41
+ The hupunct model, after training for less than one epoch on the dataset generated from the Hungarian Web Corpus reached a test micro average F1-score of 87.2% and macro average F1-score of 74,1%. The CDQ macro F1-score achieved was 83.7%. This surpasses the current state-of-the art Hungarian model, although on a different but arguably harder dataset, even with using only one prediction per token. The model learned to restore punctuations belonging to the additional base punctuation classes and all the upper versions of those classes to a reasonable extent. Additionally, it can also auto-capitalize, which is a convenient feature. The finetuning of huBERT for the APR task in Hungarian proved to be a powerful and very practical approach, especially with the usage of the HF platform.
42
 
43
  ### Examples
44
 
 
49
  'Gerendai Páltól a következőt idézzük: Gyermekkorom óta szeretem a Balatont. A balatoni tájak mindig is lenyűgöztek, és néha-néha, mikor a Balaton partján sétálok, szívemet elönti a szeretet. Hogyan lehet valami ilyen szép? A következő vendégünk Hambuch Kevin, a Balatonfenyvesi Egyetem doktora, a Knorr-Bremse kutatás-fejlesztésért felelős vezetője. Kevin ilyen-olyan projektekben vett részt a Mta-val közösen, majd 1999-ben alapítottak barátjával, Csisztapusztai Arnolddal egy céget, megpedíg a Gránit Kft-t. Ezután kezdte meg tevékenységét a német cégnél, ahol a Gránit Kft-ben szerzett tapasztalatát kamatoztatja.'
50
 
51
 
52
+ - **Developed by:** Tamás Gyenis - tamgyen@gmail.com