Edit model card
YAML Metadata Warning: The pipeline tag "conversational" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, text2text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, any-to-any, other

mistral-dpo

This model is a fine-tuned version of TheBloke/OpenHermes-2-Mistral-7B-GPTQ on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0559
  • Rewards/chosen: -0.6622
  • Rewards/rejected: -5.8356
  • Rewards/accuracies: 1.0
  • Rewards/margins: 5.1735
  • Logps/rejected: -138.0126
  • Logps/chosen: -105.3292
  • Logits/rejected: -2.5356
  • Logits/chosen: -2.7185

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2
  • training_steps: 50
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6666 0.01 10 0.5490 0.3763 0.0083 1.0 0.3680 -79.5733 -94.9446 -2.6386 -2.7333
0.439 0.01 20 0.2792 1.0686 -0.2159 1.0 1.2845 -81.8148 -88.0209 -2.6245 -2.7868
0.1683 0.02 30 0.1116 1.0530 -2.2150 1.0 3.2680 -101.8059 -88.1772 -2.6157 -2.7924
0.54 0.03 40 0.0719 -0.1064 -4.6952 1.0 4.5888 -126.6084 -99.7713 -2.5649 -2.7384
0.0965 0.03 50 0.0559 -0.6622 -5.8356 1.0 5.1735 -138.0126 -105.3292 -2.5356 -2.7185

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.0.1+cu117
  • Datasets 2.16.1
  • Tokenizers 0.15.0
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for Ram07/mistral-dpo