Edit model card

mistral-dpo

This model is a fine-tuned version of TheBloke/OpenHermes-2-Mistral-7B-GPTQ on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7137
  • Rewards/chosen: 1.0722
  • Rewards/rejected: 0.8993
  • Rewards/accuracies: 0.5962
  • Rewards/margins: 0.1729
  • Logps/rejected: -188.1420
  • Logps/chosen: -180.8729
  • Logits/rejected: -2.4148
  • Logits/chosen: -2.4328

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2
  • training_steps: 250
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.7053 0.0 10 0.6892 0.0818 0.0724 0.5962 0.0094 -196.4110 -190.7768 -2.3946 -2.4152
0.6805 0.0 20 0.6918 -0.0354 -0.0484 0.6346 0.0130 -197.6190 -191.9491 -2.4006 -2.4157
0.834 0.0 30 0.7086 -0.2432 -0.2641 0.5962 0.0210 -199.7762 -194.0263 -2.4164 -2.4252
0.8729 0.0 40 0.6981 -0.1038 -0.1416 0.6058 0.0377 -198.5504 -192.6330 -2.4265 -2.4351
0.838 0.0 50 0.6864 0.2782 0.2234 0.6058 0.0549 -194.9011 -188.8124 -2.4238 -2.4335
0.7253 0.0 60 0.6779 0.5564 0.4647 0.5865 0.0917 -192.4881 -186.0311 -2.4271 -2.4351
0.5718 0.01 70 0.6798 0.8872 0.7700 0.5769 0.1172 -189.4352 -182.7231 -2.4266 -2.4337
0.6437 0.01 80 0.6759 1.0681 0.9182 0.5096 0.1500 -187.9532 -180.9136 -2.4499 -2.4314
0.6098 0.01 90 0.7191 0.5345 -0.0458 0.5577 0.5803 -197.5928 -186.2494 -2.4888 -2.4677
0.461 0.01 100 0.6948 1.0460 0.6785 0.5481 0.3675 -190.3493 -181.1343 -2.4687 -2.4447
1.0876 0.01 110 0.7081 1.0687 0.9388 0.5288 0.1299 -187.7468 -180.9077 -2.4276 -2.4196
0.5964 0.01 120 0.7045 0.9387 0.7995 0.5673 0.1391 -189.1394 -182.2079 -2.4186 -2.4276
0.6637 0.01 130 0.7018 0.9248 0.7781 0.5865 0.1466 -189.3533 -182.3472 -2.4240 -2.4395
0.5702 0.01 140 0.6985 0.8728 0.7128 0.6058 0.1600 -190.0070 -182.8667 -2.4273 -2.4452
0.8064 0.01 150 0.6941 0.8313 0.6588 0.6058 0.1725 -190.5471 -183.2818 -2.4245 -2.4424
0.7656 0.01 160 0.6877 0.7222 0.5277 0.5962 0.1945 -191.8579 -184.3729 -2.4206 -2.4390
0.6725 0.01 170 0.6949 0.8229 0.6362 0.5865 0.1867 -190.7732 -183.3658 -2.4268 -2.4442
0.6524 0.01 180 0.7100 0.9856 0.8195 0.5673 0.1660 -188.9394 -181.7392 -2.4317 -2.4486
1.0287 0.02 190 0.7161 1.0244 0.8611 0.5769 0.1634 -188.5242 -181.3504 -2.4263 -2.4431
0.8451 0.02 200 0.7186 1.0966 0.9354 0.5769 0.1613 -187.7810 -180.6283 -2.4266 -2.4435
0.6098 0.02 210 0.7159 1.1066 0.9427 0.5865 0.1639 -187.7074 -180.5288 -2.4209 -2.4382
0.5698 0.02 220 0.7149 1.1019 0.9356 0.5962 0.1663 -187.7789 -180.5757 -2.4156 -2.4336
0.7013 0.02 230 0.7145 1.0913 0.9216 0.5962 0.1697 -187.9192 -180.6817 -2.4142 -2.4319
2.6822 0.02 240 0.7143 1.0768 0.9049 0.5962 0.1720 -188.0860 -180.8263 -2.4140 -2.4322
0.9203 0.02 250 0.7137 1.0722 0.8993 0.5962 0.1729 -188.1420 -180.8729 -2.4148 -2.4328

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.0.1+cu117
  • Datasets 2.15.0
  • Tokenizers 0.15.1
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .

Model tree for tulidivyansh25/mistral-dpo