mambarim-110m / README.md

dominguesm

Update README.md

e66d873 verified 5 months ago

preview code

raw

history blame contribute delete

No virus

4.62 kB

	---
	library_name: transformers
	language:
	- pt
	license: cc-by-4.0
	tags:
	- text-generation
	- pytorch
	- LLM
	- Portuguese
	- mamba
	datasets:
	- nicholasKluge/Pt-Corpus-Instruct
	inference:
	parameters:
	repetition_penalty: 1.2
	temperature: 0.8
	top_k: 50
	top_p: 0.85
	max_new_tokens: 150
	widget:
	- text: "O Natal é uma"
	example_title: Exemplo
	- text: "A muitos anos atrás, em uma galáxia muito distante, vivia uma raça de"
	example_title: Exemplo
	- text: "Em meio a um escândalo, a frente parlamentar pediu ao Senador Silva para"
	example_title: Exemplo
	pipeline_tag: text-generation
	---

	# Mambarim-110M

	<p align="center">
	<img width="350" alt="Camarim Logo" src="https://raw.githubusercontent.com/DominguesM/mambarim-110M/main/assets/mambarim-bg.png">
	</p>

	</br>

	## Model Summary

	Mambarim-110M is the first Portuguese language model based on a state-space model architecture (Mamba), not a transformer.

	WIP

	## Details

	- Architecture: a Mamba model pre-trained via causal language modeling
	- Size: 119,930,880 parameters
	- Context length: 2048 tokens
	- Dataset: [Pt-Corpus Instruct](https://huggingface.co/datasets/nicholasKluge/Pt-Corpus-Instruct) (6.2B tokens)
	- Language: Portuguese
	- Number of steps: 758,423

	This repository has the [source code](https://github.com/DominguesM/mambarim-110M/) used to train this model.

	## Intended Uses

	WIP

	## Out-of-scope Use

	WIP

	## Basic usage

	You need to install `transformers` from `main` until `transformers=4.39.0` is released.

	```bash
	pip install git+https://github.com/huggingface/transformers@main
	```

	We also recommend you to install both `causal_conv_1d` and `mamba-ssm` using:

	```bash
	pip install causal-conv1d>=1.2.0
	pip install mamba-ssm
	```

	You can use the classic `generate` API:

	```python
	>>> from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
	>>> import torch
	>>> tokenizer = AutoTokenizer.from_pretrained("dominguesm/mambarim-110m")
	>>> model = MambaForCausalLM.from_pretrained("dominguesm/mambarim-110m")
	>>> input_ids = tokenizer("O Natal é uma", return_tensors="pt")["input_ids"]
	>>> out = model.generate(
	input_ids,
	repetition_penalty=1.2,
	temperature=0.8,
	top_k=50,
	top_p=0.85,
	do_sample=True,
	max_new_tokens=10
	)
	>>> print(tokenizer.batch_decode(out))
	["<s> O Natal é uma data em que as pessoas passam horas de lazer e"]
	```

	## Benchmarks

	Evaluations on Brazilian Portuguese benchmarks were performed using a [Portuguese implementation of the EleutherAI LM Evaluation Harness](https://github.com/eduagarcia/lm-evaluation-harness-pt) (created by [Eduardo Garcia](https://github.com/eduagarcia/lm-evaluation-harness-pt)).

	Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/dominguesm/mambarim-110m)

	\| Model \| Average \| ENEM \| BLUEX \| OAB Exams \| ASSIN2 RTE \| ASSIN2 STS \| FAQNAD NLI \| HateBR \| PT Hate Speech \| tweetSentBR \| Architecture \|
	\| -------------------------------------- \| ----------- \| ----- \| ----- \| --------- \| ---------- \| ---------- \| ---------- \| ------ \| -------------- \| ----------- \| ------------------ \|
	\| [TeenyTinyLlama-460m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m) \| 28.86 \| 20.15 \| 25.73 \| 27.02 \| 53.61 \| 13 \| 46.41 \| 33.59 \| 22.99 \| 17.28 \| LlamaForCausalLM \|
	\| [TeenyTinyLlama-160m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-160m) \| 28.2 \| 19.24 \| 23.09 \| 22.37 \| 53.97 \| 0.24 \| 43.97 \| 36.92 \| 42.63 \| 11.39 \| LlamaForCausalLM \|
	\| [MulaBR/Mula-4x160-v0.1](https://huggingface.co/MulaBR/Mula-4x160-v0.1) \| 26.24 \| 21.34 \| 25.17 \| 25.06 \| 33.57 \| 11.35 \| 43.97 \| 41.5 \| 22.99 \| 11.24 \| MixtralForCausalLM \|
	\| [TeenyTinyLlama-460m-Chat](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m-Chat) \| 25.49 \| 20.29 \| 25.45 \| 26.74 \| 43.77 \| 4.52 \| 34 \| 33.49 \| 22.99 \| 18.13 \| LlamaForCausalLM \|
	\| [manbarim-110m](https://huggingface.co/dominguesm/mambarim-110m) \| 14.16 \| 18.4 \| 10.57 \| 21.87 \| 16.09 \| 1.89 \| 9.29 \| 15.75 \| 17.77 \| 15.79 \| MambaForCausalLM \|
	\| [GloriaTA-3B](https://huggingface.co/NOVA-vision-language/GlorIA-1.3B) \| 4.09 \| 1.89 \| 3.2 \| 5.19 \| 0 \| 2.32 \| 0.26 \| 0.28 \| 23.52 \| 0.19 \| GPTNeoForCausalLM \|