File size: 4,616 Bytes
9629ba8
 
38fec40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9629ba8
 
38fec40
9629ba8
38fec40
 
 
9629ba8
38fec40
9629ba8
38fec40
9629ba8
885bd35
 
38fec40
9629ba8
38fec40
9629ba8
38fec40
 
 
 
 
 
9629ba8
38fec40
9629ba8
38fec40
9629ba8
38fec40
9629ba8
38fec40
9629ba8
38fec40
9629ba8
38fec40
9629ba8
38fec40
9629ba8
38fec40
 
 
9629ba8
38fec40
9629ba8
38fec40
 
 
 
9629ba8
38fec40
9629ba8
38fec40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9629ba8
38fec40
9629ba8
38fec40
9629ba8
2f2e3b6
 
 
 
e66d873
 
 
 
 
 
2f2e3b6
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
library_name: transformers
language:
- pt
license: cc-by-4.0
tags:
- text-generation
- pytorch
- LLM
- Portuguese
- mamba
datasets:
- nicholasKluge/Pt-Corpus-Instruct
inference:
  parameters:
    repetition_penalty: 1.2
    temperature: 0.8
    top_k: 50
    top_p: 0.85
    max_new_tokens: 150
widget:
- text: "O Natal é uma"
  example_title: Exemplo
- text: "A muitos anos atrás, em uma galáxia muito distante, vivia uma raça de"
  example_title: Exemplo
- text: "Em meio a um escândalo, a frente parlamentar pediu ao Senador Silva para"
  example_title: Exemplo
pipeline_tag: text-generation
---

# Mambarim-110M

<p align="center">
  <img width="350" alt="Camarim Logo" src="https://raw.githubusercontent.com/DominguesM/mambarim-110M/main/assets/mambarim-bg.png">
</p>

</br>

##  Model Summary

**Mambarim-110M** is the first Portuguese language model based on a state-space model architecture (Mamba), not a transformer.

WIP

## Details

- **Architecture:** a Mamba model pre-trained via causal language modeling
- **Size:** 119,930,880 parameters
- **Context length:** 2048 tokens
- **Dataset:** [Pt-Corpus Instruct](https://huggingface.co/datasets/nicholasKluge/Pt-Corpus-Instruct) (6.2B tokens)
- **Language:** Portuguese
- **Number of steps:** 758,423

This repository has the [source code](https://github.com/DominguesM/mambarim-110M/) used to train this model.

## Intended Uses

WIP

## Out-of-scope Use

WIP

## Basic usage

You need to install `transformers` from `main` until `transformers=4.39.0` is released. 

```bash
pip install git+https://github.com/huggingface/transformers@main
```

We also recommend you to install both `causal_conv_1d` and `mamba-ssm` using: 

```bash
pip install causal-conv1d>=1.2.0
pip install mamba-ssm
```

You can use the classic `generate` API:

```python
>>> from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
>>> import torch
>>> tokenizer = AutoTokenizer.from_pretrained("dominguesm/mambarim-110m")
>>> model = MambaForCausalLM.from_pretrained("dominguesm/mambarim-110m")
>>> input_ids = tokenizer("O Natal é uma", return_tensors="pt")["input_ids"]
>>> out = model.generate(
    input_ids,
    repetition_penalty=1.2,
    temperature=0.8,
    top_k=50,
    top_p=0.85,
    do_sample=True,
    max_new_tokens=10
)
>>> print(tokenizer.batch_decode(out))
["<s> O Natal é uma data em que as pessoas passam horas de lazer e"]
```

## Benchmarks

Evaluations on Brazilian Portuguese benchmarks were performed using a [Portuguese implementation of the EleutherAI LM Evaluation Harness](https://github.com/eduagarcia/lm-evaluation-harness-pt) (created by [Eduardo Garcia](https://github.com/eduagarcia/lm-evaluation-harness-pt)).

Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/dominguesm/mambarim-110m)

| Model                                  | **Average** | ENEM  | BLUEX | OAB Exams | ASSIN2 RTE | ASSIN2 STS | FAQNAD NLI | HateBR | PT Hate Speech | tweetSentBR | **Architecture**   |
| -------------------------------------- | ----------- | ----- | ----- | --------- | ---------- | ---------- | ---------- | ------ | -------------- | ----------- | ------------------ |
| [TeenyTinyLlama-460m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m)      | 28.86       | 20.15 | 25.73 | 27.02     | 53.61      | 13         | 46.41      | 33.59  | 22.99          | 17.28       | LlamaForCausalLM   |
| [TeenyTinyLlama-160m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-160m)      | 28.2        | 19.24 | 23.09 | 22.37     | 53.97      | 0.24       | 43.97      | 36.92  | 42.63          | 11.39       | LlamaForCausalLM   |
| [MulaBR/Mula-4x160-v0.1](https://huggingface.co/MulaBR/Mula-4x160-v0.1)                 | 26.24       | 21.34 | 25.17 | 25.06     | 33.57      | 11.35      | 43.97      | 41.5   | 22.99          | 11.24       | MixtralForCausalLM |
| [TeenyTinyLlama-460m-Chat](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m-Chat) | 25.49       | 20.29 | 25.45 | 26.74     | 43.77      | 4.52       | 34         | 33.49  | 22.99          | 18.13       | LlamaForCausalLM   |
| [**manbarim-110m**](https://huggingface.co/dominguesm/mambarim-110m)           | **14.16**   | 18.4  | 10.57 | 21.87     | 16.09      | 1.89       | 9.29       | 15.75  | 17.77          | 15.79       | **MambaForCausalLM**   |
| [GloriaTA-3B](https://huggingface.co/NOVA-vision-language/GlorIA-1.3B)       | 4.09        | 1.89  | 3.2   | 5.19      | 0          | 2.32       | 0.26       | 0.28   | 23.52          | 0.19        | GPTNeoForCausalLM  |