|
--- |
|
license: mit |
|
--- |
|
# Bangla FastText Model |
|
This is a FastText pre-trained model for the Bengali language. |
|
|
|
This model is build for [bnlp](https://github.com/sagorbrur/bnlp) package. |
|
|
|
## Datasets |
|
- [Wikipedia dump datasets](https://dumps.wikimedia.org/bnwiki/latest/) |
|
|
|
## Training Details |
|
- Fasttext trained with total words = 20M, vocab size = 1171011, epoch=50, embedding dimension = 300 |
|
|
|
## Evaluation Details |
|
- training loss = 0.318668 |
|
|
|
## Usage |
|
- `pip install -U bnlp_toolkit` |
|
- `pip install fasttext==0.9.2` |
|
- Generate Vector Using Pretrained Model |
|
```py |
|
from bnlp.embedding.fasttext import BengaliFasttext |
|
|
|
bft = BengaliFasttext() |
|
word = "গ্রাম" |
|
model_path = "bengali_fasttext_wiki.bin" |
|
word_vector = bft.generate_word_vector(model_path, word) |
|
print(word_vector.shape) |
|
print(word_vector) |
|
``` |
|
|
|
- Train Bengali FastText Model |
|
|
|
```py |
|
from bnlp.embedding.fasttext import BengaliFasttext |
|
|
|
bft = BengaliFasttext() |
|
data = "raw_text.txt" |
|
model_name = "saved_model.bin" |
|
epoch = 50 |
|
bft.train(data, model_name, epoch) |
|
``` |
|
|
|
- Generate Vector File from Fasttext Binary Model |
|
```py |
|
from bnlp.embedding.fasttext import BengaliFasttext |
|
|
|
bft = BengaliFasttext() |
|
|
|
model_path = "mymodel.bin" |
|
out_vector_name = "myvector.txt" |
|
bft.bin2vec(model_path, out_vector_name) |
|
``` |