LBK95's picture
End of training
4047841 verified
metadata
base_model: meta-llama/Llama-2-7b-hf
library_name: peft
license: llama2
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: Llama-2-7b-hf-DPO-LookAhead5_FullEval_TTree1.4_TLoop0.7_TEval0.2_V2.0
    results: []

Llama-2-7b-hf-DPO-LookAhead5_FullEval_TTree1.4_TLoop0.7_TEval0.2_V2.0

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9728
  • Rewards/chosen: -2.0556
  • Rewards/rejected: -1.8953
  • Rewards/accuracies: 0.4167
  • Rewards/margins: -0.1602
  • Logps/rejected: -146.9038
  • Logps/chosen: -163.5856
  • Logits/rejected: -0.8198
  • Logits/chosen: -0.8110

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6766 0.2994 78 0.7200 0.0401 0.0730 0.4167 -0.0329 -127.2202 -142.6290 -0.3528 -0.3393
0.6875 0.5988 156 0.6657 -0.4080 -0.4881 0.5833 0.0801 -132.8311 -147.1099 -0.3720 -0.3575
0.7999 0.8983 234 0.6842 -0.3659 -0.4094 0.6667 0.0435 -132.0449 -146.6892 -0.3674 -0.3517
0.4879 1.1977 312 0.6694 -0.2237 -0.2979 0.4167 0.0742 -130.9293 -145.2672 -0.3979 -0.3821
0.6233 1.4971 390 0.6523 -0.9992 -1.1797 0.5 0.1804 -139.7471 -153.0225 -0.5012 -0.4885
0.4034 1.7965 468 0.7021 -0.9141 -1.0257 0.4167 0.1116 -138.2080 -152.1710 -0.4511 -0.4394
0.1778 2.0960 546 0.7896 -1.2322 -1.2047 0.4167 -0.0275 -139.9971 -155.3521 -0.5752 -0.5642
0.2732 2.3954 624 0.9364 -1.8694 -1.7281 0.4167 -0.1412 -145.2318 -161.7236 -0.7728 -0.7633
0.1812 2.6948 702 0.9683 -2.0710 -1.9135 0.4167 -0.1575 -147.0860 -163.7400 -0.8137 -0.8049
0.1798 2.9942 780 0.9728 -2.0556 -1.8953 0.4167 -0.1602 -146.9038 -163.5856 -0.8198 -0.8110

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.2
  • Pytorch 2.4.1+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1