dorinalakatos commited on
Commit
b9c9a3b
1 Parent(s): 6cdcc4d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md CHANGED
@@ -1,3 +1,57 @@
1
  ---
2
  license: cc-by-nc-sa-4.0
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-nc-sa-4.0
3
+ language:
4
+ - hu
5
+ - en
6
+ tags:
7
+ - translation
8
+ - opennmt
9
  ---
10
+
11
+ inference: false
12
+ ---
13
+
14
+ ### Introduction
15
+
16
+ Hungarian - English translation model that was trained on the [Hunglish2](http://mokk.bme.hu/resources/hunglishcorpus/) dataset using OpenNMT.
17
+
18
+ ### Usage
19
+
20
+ Install the necessary dependencies:
21
+
22
+ ```bash
23
+ pip3 install ctranslate2 pyonmttok
24
+ ```
25
+
26
+ Simple tokenization & translation using Python:
27
+
28
+
29
+ ```python
30
+ import ctranslate2
31
+ import pyonmttok
32
+ from huggingface_hub import snapshot_download
33
+ model_dir = snapshot_download(repo_id="SZTAKI-HLT/opennmt-hu-en", revision="main")
34
+
35
+ tokenizer=pyonmttok.Tokenizer(mode="none", sp_model_path = model_dir + "/sp_m.model")
36
+ tokenized=tokenizer.tokenize("Hello világ")
37
+
38
+ translator = ctranslate2.Translator(model_dir)
39
+ translated = translator.translate_batch([tokenized[0]])
40
+ print(tokenizer.detokenize(translated[0][0]['tokens']))
41
+ ```
42
+
43
+
44
+ ## Citation
45
+
46
+ If you use our model, please cite the following paper:
47
+ ```
48
+
49
+ @inproceedings{nagy2022syntax,
50
+ title={Syntax-based data augmentation for Hungarian-English machine translation},
51
+ author={Nagy, Attila and Nanys, Patrick and Konr{\'a}d, Bal{\'a}zs Frey and Bial, Bence and {\'A}cs, Judit},
52
+ booktitle = {XVIII. Magyar Számítógépes Nyelvészeti Konferencia (MSZNY 2022)},
53
+ year={2022},
54
+ publisher = {Szegedi Tudományegyetem, Informatikai Intézet},
55
+ }
56
+
57
+ ```