shahidul034 commited on
Commit
4a71c75
1 Parent(s): d38cdcc

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +75 -0
README.md ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # text_generation_bangla_model
3
+ BanglaCLM dataset:
4
+
5
+ - OSCAR: 12.84GB
6
+
7
+ - Wikipedia dump: 6.24GB
8
+
9
+ - ProthomAlo: 3.92GB
10
+
11
+ - Kalerkantho: 3.24GB
12
+
13
+
14
+ ## Model description
15
+
16
+ - context size : 128
17
+
18
+
19
+ ## Training and evaluation data
20
+ The BanglaCLM data set is divided into a training set (90%)and a validation set (10%).
21
+
22
+
23
+ ## Training procedure
24
+
25
+ ### Training hyperparameters
26
+
27
+ The following hyperparameters were used during training:
28
+
29
+ - Batch size: 32
30
+
31
+ - Initial learning rate: 5e-5
32
+
33
+ - Number of warmup steps: 10000
34
+
35
+ - Weight decay rate: 0.01
36
+
37
+ - Tokenization algorithm: BPE
38
+
39
+ - Vocabulary size of tokenizer: 50256
40
+
41
+ - Total trainable params: 124,439,808
42
+
43
+ - Epochs: 40
44
+
45
+ - Number of training steps: 40772228
46
+
47
+ - training_precision: float32
48
+
49
+
50
+ ### Training results
51
+
52
+ perplexity score: 2.86.
53
+
54
+
55
+ ### Framework versions
56
+
57
+ - Transformers 4.26.1
58
+ - TensorFlow 2.11.0
59
+ - Datasets 2.10.0
60
+ - Tokenizers 0.13.2
61
+
62
+ ### Citation
63
+ If you find this model helpful, please cite.
64
+ ```
65
+ @INPROCEEDINGS{10303383,
66
+ author={Salim, Md. Shahidul and Murad, Hasan and Das, Dola and Ahmed, Faisal},
67
+ booktitle={2023 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)},
68
+ title={BanglaGPT: A Generative Pretrained Transformer-Based Model for Bangla Language},
69
+ year={2023},
70
+ volume={},
71
+ number={},
72
+ pages={56-59},
73
+ doi={10.1109/ICICT4SD59951.2023.10303383}}
74
+
75
+ ```