File size: 4,209 Bytes
180ab05 0ed5882 2858264 0ed5882 180ab05 5a150fc 8131e42 5a150fc 180ab05 5a150fc 180ab05 5a150fc 180ab05 8131e42 e53699e 33eb34c e53699e 8131e42 3534d05 5a150fc 180ab05 5a150fc 180ab05 33eb34c 5a150fc 180ab05 5a150fc 8131e42 5a150fc 180ab05 8131e42 5a150fc 180ab05 8131e42 180ab05 5a150fc 180ab05 5a150fc 180ab05 5a150fc 180ab05 5a150fc 180ab05 5a150fc 180ab05 5a150fc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
---
language:
- en
- fr
- ar
license: other
library_name: transformers
tags:
- orpo
- llama 3
- rlhf
- sft
datasets:
- mlabonne/orpo-dpo-mix-40k
---
# OrpoLlama-3-8B
![](https://i.imgur.com/ZHwzQvI.png)
This is an ORPO fine-tune of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on 1k samples of [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k) created for [this article](https://huggingface.co/blog/mlabonne/orpo-llama-3).
It's a successful fine-tune that follows the ChatML template!
**Try the demo**: https://huggingface.co/spaces/mlabonne/OrpoLlama-3-8B
## π Application
This model uses a context window of 8k. It was trained with the ChatML template.
## β‘ Quantized models
Thanks to bartowski, solidrust, and LoneStriker for the quantized models.
* **GGUF**: https://huggingface.co/bartowski/OrpoLlama-3-8B-GGUF
* **AWQ**: https://huggingface.co/solidrust/OrpoLlama-3-8B-AWQ
* **EXL2**:
* https://huggingface.co/LoneStriker/OrpoLlama-3-8B-3.0bpw-h6-exl2
* https://huggingface.co/LoneStriker/OrpoLlama-3-8B-4.0bpw-h6-exl2
* https://huggingface.co/LoneStriker/OrpoLlama-3-8B-5.0bpw-h6-exl2
* https://huggingface.co/LoneStriker/OrpoLlama-3-8B-6.0bpw-h6-exl2
* https://huggingface.co/LoneStriker/OrpoLlama-3-8B-8.0bpw-h6-exl2
## π Evaluation
### Nous
OrpoLlama-4-8B outperforms Llama-3-8B-Instruct on the GPT4All and TruthfulQA datasets.
Evaluation performed using [LLM AutoEval](https://github.com/mlabonne/llm-autoeval), see the entire leaderboard [here](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard).
| Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------: | --------: | --------: | ---------: | --------: |
| [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) [π](https://gist.github.com/mlabonne/8329284d86035e6019edb11eb0933628) | 51.34 | 41.22 | 69.86 | 51.65 | 42.64 |
| [**mlabonne/OrpoLlama-3-8B**](https://huggingface.co/mlabonne/OrpoLlama-3-8B) [π](https://gist.github.com/mlabonne/22896a1ae164859931cc8f4858c97f6f) | **48.63** | **34.17** | **70.59** | **52.39** | **37.36** |
| [mlabonne/OrpoLlama-3-8B-1k](https://huggingface.co/mlabonne/OrpoLlama-3-8B) [π](https://gist.github.com/mlabonne/f41dad371d1781d0434a4672fd6f0b82) | 46.76 | 31.56 | 70.19 | 48.11 | 37.17 |
| [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) [π](https://gist.github.com/mlabonne/616b6245137a9cfc4ea80e4c6e55d847) | 45.42 | 31.1 | 69.95 | 43.91 | 36.7 |
`mlabonne/OrpoLlama-3-8B-1k` corresponds to a version of this model trained on 1K samples (you can see the parameters in [this article](https://huggingface.co/blog/mlabonne/orpo-llama-3)).
### Open LLM Leaderboard
TBD.
## π Training curves
You can find the experiment on W&B at [this address](https://wandb.ai/mlabonne/DPO/runs/vxnmq24z/workspace?nw=nwusermlabonne).
![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/zm71HyZiG96YY1GUtpfHq.png)
## π» Usage
```python
!pip install -qU transformers accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "mlabonne/OrpoLlama-3-8B"
messages = [{"role": "user", "content": "What is a large language model?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
``` |