metadata

base_model: meta-llama/Llama-2-7b-hf
library_name: peft
license: llama2
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: Llama-2-7b-hf-DPO-LookAhead5_FullEval_TTree1.4_TLoop0.7_TEval0.2_V2.0
    results: []

Llama-2-7b-hf-DPO-LookAhead5_FullEval_TTree1.4_TLoop0.7_TEval0.2_V2.0

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.9728
Rewards/chosen: -2.0556
Rewards/rejected: -1.8953
Rewards/accuracies: 0.4167
Rewards/margins: -0.1602
Logps/rejected: -146.9038
Logps/chosen: -163.5856
Logits/rejected: -0.8198
Logits/chosen: -0.8110

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6766	0.2994	78	0.7200	0.0401	0.0730	0.4167	-0.0329	-127.2202	-142.6290	-0.3528	-0.3393
0.6875	0.5988	156	0.6657	-0.4080	-0.4881	0.5833	0.0801	-132.8311	-147.1099	-0.3720	-0.3575
0.7999	0.8983	234	0.6842	-0.3659	-0.4094	0.6667	0.0435	-132.0449	-146.6892	-0.3674	-0.3517
0.4879	1.1977	312	0.6694	-0.2237	-0.2979	0.4167	0.0742	-130.9293	-145.2672	-0.3979	-0.3821
0.6233	1.4971	390	0.6523	-0.9992	-1.1797	0.5	0.1804	-139.7471	-153.0225	-0.5012	-0.4885
0.4034	1.7965	468	0.7021	-0.9141	-1.0257	0.4167	0.1116	-138.2080	-152.1710	-0.4511	-0.4394
0.1778	2.0960	546	0.7896	-1.2322	-1.2047	0.4167	-0.0275	-139.9971	-155.3521	-0.5752	-0.5642
0.2732	2.3954	624	0.9364	-1.8694	-1.7281	0.4167	-0.1412	-145.2318	-161.7236	-0.7728	-0.7633
0.1812	2.6948	702	0.9683	-2.0710	-1.9135	0.4167	-0.1575	-147.0860	-163.7400	-0.8137	-0.8049
0.1798	2.9942	780	0.9728	-2.0556	-1.8953	0.4167	-0.1602	-146.9038	-163.5856	-0.8198	-0.8110

Framework versions

PEFT 0.12.0
Transformers 4.44.2
Pytorch 2.4.1+cu121
Datasets 2.21.0
Tokenizers 0.19.1