metadata

library_name: transformers
license: llama3.1
base_model: Magpie-Align/Llama-3.1-8B-Magpie-SFT-GMix-550K
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
  - trl
  - dpo
  - generated_from_trainer
datasets:
  - Magpie-Align/MagpieLM-4B-DPO-Data-v0.1
model-index:
  - name: Llama-3.1-8B-Magpie-SFT-GMix-550K-DPO-02Mix
    results: []

Llama-3.1-8B-Magpie-SFT-GMix-550K-DPO-02Mix

This model is a fine-tuned version of Magpie-Align/Llama-3.1-8B-Magpie-SFT-GMix-550K on the Magpie-Align/MagpieLM-4B-DPO-Data-v0.1 dataset. It achieves the following results on the evaluation set:

Loss: 0.3866
Rewards/chosen: -5.1623
Rewards/rejected: -6.8930
Rewards/accuracies: 0.8060
Rewards/margins: 1.7307
Logps/rejected: -1154.4679
Logps/chosen: -990.1328
Logits/rejected: -0.6102
Logits/chosen: -0.6705

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-07
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 16
total_train_batch_size: 128
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.686	0.0653	100	0.6856	-0.0491	-0.0616	0.6480	0.0125	-471.3315	-478.8181	-0.7034	-0.7427
0.6218	0.1306	200	0.6277	-0.6128	-0.7720	0.6960	0.1591	-542.3653	-535.1920	-0.7771	-0.8125
0.5705	0.1959	300	0.5545	-2.4738	-3.0052	0.7270	0.5314	-765.6894	-721.2881	-0.7894	-0.8230
0.4606	0.2612	400	0.5081	-2.6780	-3.3782	0.7560	0.7002	-802.9893	-741.7116	-0.6813	-0.7247
0.4314	0.3266	500	0.4787	-3.6697	-4.6026	0.7630	0.9329	-925.4283	-840.8740	-0.6189	-0.6691
0.449	0.3919	600	0.4533	-3.7414	-4.8019	0.7820	1.0604	-945.3563	-848.0514	-0.6157	-0.6681
0.4538	0.4572	700	0.4350	-4.3858	-5.6549	0.7890	1.2690	-1030.6561	-912.4920	-0.5789	-0.6331
0.35	0.5225	800	0.4186	-4.7129	-6.1662	0.8010	1.4533	-1081.7843	-945.1964	-0.5778	-0.6347
0.4153	0.5878	900	0.4108	-4.9836	-6.5320	0.7970	1.5484	-1118.3677	-972.2631	-0.5895	-0.6474
0.3935	0.6531	1000	0.3999	-4.4303	-5.9370	0.8110	1.5067	-1058.8646	-916.9379	-0.6016	-0.6598
0.3205	0.7184	1100	0.3950	-5.1884	-6.8827	0.8010	1.6943	-1153.4371	-992.7452	-0.5846	-0.6452
0.3612	0.7837	1200	0.3901	-5.0426	-6.7179	0.8040	1.6753	-1136.9619	-978.1701	-0.6046	-0.6637
0.3058	0.8490	1300	0.3877	-5.1224	-6.8428	0.8040	1.7204	-1149.4465	-986.1475	-0.6087	-0.6690
0.3467	0.9144	1400	0.3871	-5.2335	-6.9809	0.8090	1.7474	-1163.2629	-997.2610	-0.6071	-0.6672
0.3197	0.9797	1500	0.3867	-5.1502	-6.8793	0.8080	1.7291	-1153.0979	-988.9237	-0.6120	-0.6722

Framework versions

Transformers 4.44.2
Pytorch 2.4.1+cu121
Datasets 3.0.0
Tokenizers 0.19.1