SaiedAlshahrani commited on
Commit
999fe5c
1 Parent(s): ddf6361

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -8
README.md CHANGED
@@ -29,7 +29,6 @@ It achieves the following results on the evaluation set:
29
 
30
  - Pseudo-Perplexity: 23.70
31
 
32
-
33
  ## Model description
34
 
35
  We trained this Arabic Wikipedia Masked Language Model (arRoBERTa<sub>BASE</sub>) to evaluate its performance using the Fill-Mask evaluation task and the Masked Arab States Dataset ([MASD](https://huggingface.co/datasets/SaiedAlshahrani/MASD)) dataset and measure the *impact* of **template-based translation** on the Egyptian Arabic Wikipedia edition.
@@ -52,17 +51,14 @@ For more details about the experiment, please **read** and **cite** our paper:
52
  }
53
  ```
54
 
55
-
56
  ## Intended uses & limitations
57
 
58
  We do **not** recommend using this model because it was trained *only* on the Arabic Wikipedia articles, <u>unless</u> you fine-tune the model on a large, organic, and representative Arabic dataset.
59
 
60
-
61
  ## Training and evaluation data
62
 
63
  We have trained this model on the Arabic Wikipedia articles ([SaiedAlshahrani/Arabic_Wikipedia_20230101_bots](https://huggingface.co/datasets/SaiedAlshahrani/Arabic_Wikipedia_20230101_bots)) without using any validation or evaluation data (only training data) due to a lack of computational power.
64
 
65
-
66
  ## Training procedure
67
 
68
  We have trained this model using the Paperspace GPU-Cloud service. We used a machine with 8 CPUs, 45GB RAM, and A6000 GPU with 48GB RAM.
@@ -78,7 +74,6 @@ The following hyperparameters were used during training:
78
  - lr_scheduler_type: linear
79
  - num_epochs: 5
80
 
81
-
82
  ### Training results
83
 
84
  | Epoch | Step | Training Loss |
@@ -93,15 +88,12 @@ The following hyperparameters were used during training:
93
  |:--------------:|:------------------------:|:----------------------:|:-------------------------:|:----------:|:--------:|
94
  | 17048.756800 | 248.355000 | 0.970000 | 140390797515571200.000000 | 3.639375 | 5.000000 |
95
 
96
-
97
-
98
  ### Evaluation results
99
  This arRoBERTa<sub>BASE</sub> model has been evaluated on the Masked Arab States Dataset ([SaiedAlshahrani/MASD](https://huggingface.co/datasets/SaiedAlshahrani/MASD)).
100
  | K=10 | K=50 | K=100 |
101
  |:----:|:-----:|:----:|
102
  | 43.12%| 45% | 50.62% |
103
 
104
-
105
  ### Framework versions
106
 
107
  - Datasets 2.9.0
 
29
 
30
  - Pseudo-Perplexity: 23.70
31
 
 
32
  ## Model description
33
 
34
  We trained this Arabic Wikipedia Masked Language Model (arRoBERTa<sub>BASE</sub>) to evaluate its performance using the Fill-Mask evaluation task and the Masked Arab States Dataset ([MASD](https://huggingface.co/datasets/SaiedAlshahrani/MASD)) dataset and measure the *impact* of **template-based translation** on the Egyptian Arabic Wikipedia edition.
 
51
  }
52
  ```
53
 
 
54
  ## Intended uses & limitations
55
 
56
  We do **not** recommend using this model because it was trained *only* on the Arabic Wikipedia articles, <u>unless</u> you fine-tune the model on a large, organic, and representative Arabic dataset.
57
 
 
58
  ## Training and evaluation data
59
 
60
  We have trained this model on the Arabic Wikipedia articles ([SaiedAlshahrani/Arabic_Wikipedia_20230101_bots](https://huggingface.co/datasets/SaiedAlshahrani/Arabic_Wikipedia_20230101_bots)) without using any validation or evaluation data (only training data) due to a lack of computational power.
61
 
 
62
  ## Training procedure
63
 
64
  We have trained this model using the Paperspace GPU-Cloud service. We used a machine with 8 CPUs, 45GB RAM, and A6000 GPU with 48GB RAM.
 
74
  - lr_scheduler_type: linear
75
  - num_epochs: 5
76
 
 
77
  ### Training results
78
 
79
  | Epoch | Step | Training Loss |
 
88
  |:--------------:|:------------------------:|:----------------------:|:-------------------------:|:----------:|:--------:|
89
  | 17048.756800 | 248.355000 | 0.970000 | 140390797515571200.000000 | 3.639375 | 5.000000 |
90
 
 
 
91
  ### Evaluation results
92
  This arRoBERTa<sub>BASE</sub> model has been evaluated on the Masked Arab States Dataset ([SaiedAlshahrani/MASD](https://huggingface.co/datasets/SaiedAlshahrani/MASD)).
93
  | K=10 | K=50 | K=100 |
94
  |:----:|:-----:|:----:|
95
  | 43.12%| 45% | 50.62% |
96
 
 
97
  ### Framework versions
98
 
99
  - Datasets 2.9.0