File size: 1,284 Bytes
a40bda6 d34beaf 96fd4c0 d34beaf 8fb1c05 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
---
license: mit
---
# Bangla FastText Model
This is a FastText pre-trained model for the Bengali language.
This model is build for [bnlp](https://github.com/sagorbrur/bnlp) package.
## Datasets
- [Wikipedia dump datasets](https://dumps.wikimedia.org/bnwiki/latest/)
## Training Details
- Fasttext trained with total words = 20M, vocab size = 1171011, epoch=50, embedding dimension = 300
## Evaluation Details
- training loss = 0.318668
## Usage
- `pip install -U bnlp_toolkit`
- `pip install fasttext==0.9.2`
- Generate Vector Using Pretrained Model
```py
from bnlp.embedding.fasttext import BengaliFasttext
bft = BengaliFasttext()
word = "গ্রাম"
model_path = "bengali_fasttext_wiki.bin"
word_vector = bft.generate_word_vector(model_path, word)
print(word_vector.shape)
print(word_vector)
```
- Train Bengali FastText Model
```py
from bnlp.embedding.fasttext import BengaliFasttext
bft = BengaliFasttext()
data = "raw_text.txt"
model_name = "saved_model.bin"
epoch = 50
bft.train(data, model_name, epoch)
```
- Generate Vector File from Fasttext Binary Model
```py
from bnlp.embedding.fasttext import BengaliFasttext
bft = BengaliFasttext()
model_path = "mymodel.bin"
out_vector_name = "myvector.txt"
bft.bin2vec(model_path, out_vector_name)
``` |