UltraLlama-3.1-8B

Llama 3.1 8B model trained on a high-quality magpie dataset to measure its quality:

Model	MMLU	Hellaswag	ARC-C	GSM8K	TruthfulQA	Winogrande	IFEval	MMLU-Pro	MATH Lvl 5	GPQA	MuSR	BBH
Meta-Llama-3.1-8B	65.28	82.09	59.22	51.02	45.15	77.58	11.45	32.74	4.38	30.93	7.98	46.77
Meta-Llama-3-8B-Instruct	65.60	78.79	61.95	75.28	51.66	75.77	47.43	5.87	7.95	30.11	37.92	49.04
FineLlama-3.1-8B	62.22	80.30	55.55	51.02	49.51	75.30	1.68	30.90	4.12	27.45	35.77	44.22
UltraLlama-3.1-8B	54.36	74.98	55.46	51.10	49.93	72.05	10.19	26.63	3.08	25.47	40.93	42.44

Magpie underperforms FineTome-100k. The quality looks okay, but not as good as high-quality open-source datasets.

mlabonne
/

UltraLlama-3.1-8B