SzegedAI
/

charmen-electra

Feature Extraction

byte representation

gradient boosting

Model card Files Files and versions Community

charmen-electra / README.md

ficsort's picture

Update README.md

41dd240 about 2 years ago

|

No virus

796 Bytes

	---
	language: hu
	license: apache-2.0
	datasets:
	- common_crawl
	- wikipedia
	tags:
	- byte representation
	- gradient boosting
	- hungarian
	---

	# Charmen-Electra

	A byte-based transformer model trained on Hungarian language. In order to use the model you will need a custom Tokenizer which is available at: [https://github.com/szegedai/byte-offset-tokenizer](https://github.com/szegedai/byte-offset-tokenizer).

	Since we use a custom architecture with Gradient Boosting, Down- and Up-Sampling, you have to enable Trusted Remote Code like:

	```python
	model = AutoModel.from_pretrained("SzegedAI/charmen-electra", trust_remote_code=True)
	```
	# Acknowledgement
	[![Artificial Intelligence - National Laboratory - Hungary](https://milab.tk.hu/uploads/images/milab_logo_en.png)](https://mi.nemzetilabor.hu/)