Edit model card

zephyr-dpop-qlora-uf-oursuf6k-5e-6

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the generation/UF6k dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0972
  • Positive Losses: 4.6136
  • Dpo Losses: 0.6513
  • Rewards/chosen: 0.1036
  • Rewards/rejected: -0.0071
  • Rewards/accuracies: 0.6270
  • Rewards/margins: 0.1107
  • Rewards/margins Max: 0.4366
  • Rewards/margins Min: -0.1898
  • Rewards/margins Std: 0.2797
  • Logps/rejected: -259.8883
  • Logps/chosen: -274.8616
  • Logits/rejected: -2.7883
  • Logits/chosen: -2.8318

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 16
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Positive Losses Dpo Losses Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6612 0.15 100 0.7167 0.3423 0.6805 0.0814 0.0535 0.5913 0.0279 0.1341 -0.0698 0.0908 -253.8317 -277.0839 -2.8097 -2.8549
0.6652 0.29 200 0.7424 0.6158 0.6684 0.1444 0.0868 0.6071 0.0576 0.2387 -0.1086 0.1564 -250.5054 -270.7805 -2.7612 -2.8047
0.6493 0.44 300 0.7586 0.7357 0.6609 0.1701 0.0950 0.6151 0.0751 0.2766 -0.1065 0.1714 -249.6817 -268.2108 -2.7656 -2.8106
0.6224 0.58 400 0.9943 3.3747 0.6529 0.1119 0.0109 0.6389 0.1009 0.3836 -0.1621 0.2434 -258.0921 -274.0359 -2.7767 -2.8199
0.5674 0.73 500 1.1831 5.7365 0.6565 0.0641 -0.0334 0.6270 0.0975 0.4143 -0.1884 0.2702 -262.5242 -278.8098 -2.7934 -2.8376
0.5749 0.88 600 1.0992 4.5979 0.6512 0.1035 -0.0073 0.6190 0.1109 0.4368 -0.1884 0.2790 -259.9164 -274.8698 -2.7839 -2.8279

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for just1nseo/zephyr-dpop-qlora-uf-oursuf6k-5e-6

Adapter
this model