Edit model card

DPO_TEST_1

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0229
  • Rewards/chosen: -740.3076
  • Rewards/rejected: -1059.6395
  • Rewards/accuracies: 0.9988
  • Rewards/margins: 319.3320
  • Logps/rejected: -10817.7158
  • Logps/chosen: -7838.3896
  • Logits/rejected: -32.6170
  • Logits/chosen: -26.3151

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 2
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2
  • training_steps: 25806

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
37.7926 0.67 2867 2.8916 -149.7567 -197.0506 0.9359 47.2939 -2191.8269 -1932.8804 38.0669 4.1036
84.7229 1.33 5734 2.0247 -327.4202 -450.9706 0.9656 123.5504 -4731.0264 -3709.5146 -17.6232 -16.2614
0.4302 2.0 8601 0.2490 -391.4300 -536.8747 0.9923 145.4447 -5590.0679 -4349.6123 -13.6537 -12.7337
0.6952 2.67 11468 0.0587 -606.4489 -775.4740 0.9970 169.0251 -7976.0605 -6499.8027 8.1646 -0.2018
0.2119 3.33 14335 0.2843 -641.6364 -925.0908 0.9907 283.4543 -9472.2285 -6851.6772 -11.2088 -13.0496
0.129 4.0 17202 0.1065 -706.7910 -1019.4420 0.9958 312.6511 -10415.7412 -7503.2227 29.4650 10.0032
0.1046 4.67 20069 0.1005 -758.2514 -1105.3041 0.9977 347.0525 -11274.3594 -8017.8281 -37.3526 -28.3912
0.0656 5.33 22936 0.0241 -790.2775 -1078.3324 0.9986 288.0548 -11004.6445 -8338.0889 -7.1017 -13.6854
0.0 6.0 25803 0.0229 -740.3076 -1059.6395 0.9988 319.3320 -10817.7158 -7838.3896 -32.6170 -26.3151

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.0.1
  • Datasets 2.16.1
  • Tokenizers 0.15.0
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for RAIJAY/NBERT_DPO

Adapter
(882)
this model