sagorsarker
/

bangla-fasttext

Model card Files Files and versions Community

bangla-fasttext / README.md

sagorsarker's picture

Update README.md

8fb1c05 almost 2 years ago

|

history blame contribute delete

1.28 kB

	---
	license: mit
	---
	# Bangla FastText Model
	This is a FastText pre-trained model for the Bengali language.

	This model is build for [bnlp](https://github.com/sagorbrur/bnlp) package.

	## Datasets
	- [Wikipedia dump datasets](https://dumps.wikimedia.org/bnwiki/latest/)

	## Training Details
	- Fasttext trained with total words = 20M, vocab size = 1171011, epoch=50, embedding dimension = 300

	## Evaluation Details
	- training loss = 0.318668

	## Usage
	- `pip install -U bnlp_toolkit`
	- `pip install fasttext==0.9.2`
	- Generate Vector Using Pretrained Model
	```py
	from bnlp.embedding.fasttext import BengaliFasttext

	bft = BengaliFasttext()
	word = "গ্রাম"
	model_path = "bengali_fasttext_wiki.bin"
	word_vector = bft.generate_word_vector(model_path, word)
	print(word_vector.shape)
	print(word_vector)
	```

	- Train Bengali FastText Model

	```py
	from bnlp.embedding.fasttext import BengaliFasttext

	bft = BengaliFasttext()
	data = "raw_text.txt"
	model_name = "saved_model.bin"
	epoch = 50
	bft.train(data, model_name, epoch)
	```

	- Generate Vector File from Fasttext Binary Model
	```py
	from bnlp.embedding.fasttext import BengaliFasttext

	bft = BengaliFasttext()

	model_path = "mymodel.bin"
	out_vector_name = "myvector.txt"
	bft.bin2vec(model_path, out_vector_name)
	```