PEFT
Safetensors
qwen2
alignment-handbook
trl
dpo
Generated from Trainer
Edit model card

Qwen2-7B-Instruct-SPPO-Function-call-v2.12

This model is a fine-tuned version of slm-research-vn/Qwen2-7B-Instruct-SPPO-Function-call-v2.8 on the slm-research-vn/dpo-format-function-calling-v4, the slm-research-vn/dpo-format-glaive-code-assistant-v3-with-mistral-large-slm-iter4 and the argilla/dpo-mix-7k datasets. It achieves the following results on the evaluation set:

  • Loss: 0.3322
  • Rewards/chosen: 0.5523
  • Rewards/rejected: -0.7005
  • Rewards/accuracies: 0.9017
  • Rewards/margins: 1.2528
  • Logps/rejected: -278.7327
  • Logps/chosen: -129.0717
  • Logits/rejected: -0.5984
  • Logits/chosen: -0.7738

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6806 0.0916 100 0.6816 0.0303 0.0099 0.6445 0.0205 -264.5260 -139.5110 -0.5879 -0.7638
0.5704 0.1832 200 0.5993 0.3495 0.1473 0.8237 0.2023 -261.7780 -133.1277 -0.5881 -0.7638
0.5032 0.2749 300 0.5313 0.5795 0.1792 0.8526 0.4003 -261.1383 -128.5271 -0.5893 -0.7651
0.4548 0.3665 400 0.4727 0.6406 0.0523 0.8844 0.5884 -263.6780 -127.3051 -0.5901 -0.7660
0.3823 0.4581 500 0.4235 0.6412 -0.1314 0.8931 0.7726 -267.3507 -127.2934 -0.5914 -0.7672
0.3513 0.5497 600 0.3843 0.6087 -0.3415 0.9133 0.9502 -271.5532 -127.9448 -0.5936 -0.7693
0.3444 0.6413 700 0.3571 0.5871 -0.5028 0.9104 1.0898 -274.7784 -128.3763 -0.5965 -0.7721
0.3486 0.7329 800 0.3427 0.5681 -0.6155 0.9104 1.1836 -277.0341 -128.7559 -0.5971 -0.7725
0.3317 0.8246 900 0.3349 0.5586 -0.6739 0.9133 1.2326 -278.2013 -128.9451 -0.5993 -0.7748
0.3077 0.9162 1000 0.3328 0.5530 -0.6974 0.9075 1.2504 -278.6715 -129.0585 -0.5998 -0.7754

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.0
  • Pytorch 2.3.1+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for khongtrunght/Qwen2-7B-Instruct-SPPO-Function-call-v2.12

Dataset used to train khongtrunght/Qwen2-7B-Instruct-SPPO-Function-call-v2.12