mistral-dpo / README.md

tulidivyansh25

tulidivyansh25/mistral_youtube_dpo

2423fd5 verified 8 months ago

preview code

raw

history blame contribute delete

No virus

6.66 kB

	---
	license: apache-2.0
	base_model: TheBloke/OpenHermes-2-Mistral-7B-GPTQ
	tags:
	- trl
	- dpo
	- generated_from_trainer
	model-index:
	- name: mistral-dpo
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# mistral-dpo

	This model is a fine-tuned version of [TheBloke/OpenHermes-2-Mistral-7B-GPTQ](https://huggingface.co/TheBloke/OpenHermes-2-Mistral-7B-GPTQ) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.7137
	- Rewards/chosen: 1.0722
	- Rewards/rejected: 0.8993
	- Rewards/accuracies: 0.5962
	- Rewards/margins: 0.1729
	- Logps/rejected: -188.1420
	- Logps/chosen: -180.8729
	- Logits/rejected: -2.4148
	- Logits/chosen: -2.4328

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 1
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 2
	- training_steps: 250
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.7053 \| 0.0 \| 10 \| 0.6892 \| 0.0818 \| 0.0724 \| 0.5962 \| 0.0094 \| -196.4110 \| -190.7768 \| -2.3946 \| -2.4152 \|
	\| 0.6805 \| 0.0 \| 20 \| 0.6918 \| -0.0354 \| -0.0484 \| 0.6346 \| 0.0130 \| -197.6190 \| -191.9491 \| -2.4006 \| -2.4157 \|
	\| 0.834 \| 0.0 \| 30 \| 0.7086 \| -0.2432 \| -0.2641 \| 0.5962 \| 0.0210 \| -199.7762 \| -194.0263 \| -2.4164 \| -2.4252 \|
	\| 0.8729 \| 0.0 \| 40 \| 0.6981 \| -0.1038 \| -0.1416 \| 0.6058 \| 0.0377 \| -198.5504 \| -192.6330 \| -2.4265 \| -2.4351 \|
	\| 0.838 \| 0.0 \| 50 \| 0.6864 \| 0.2782 \| 0.2234 \| 0.6058 \| 0.0549 \| -194.9011 \| -188.8124 \| -2.4238 \| -2.4335 \|
	\| 0.7253 \| 0.0 \| 60 \| 0.6779 \| 0.5564 \| 0.4647 \| 0.5865 \| 0.0917 \| -192.4881 \| -186.0311 \| -2.4271 \| -2.4351 \|
	\| 0.5718 \| 0.01 \| 70 \| 0.6798 \| 0.8872 \| 0.7700 \| 0.5769 \| 0.1172 \| -189.4352 \| -182.7231 \| -2.4266 \| -2.4337 \|
	\| 0.6437 \| 0.01 \| 80 \| 0.6759 \| 1.0681 \| 0.9182 \| 0.5096 \| 0.1500 \| -187.9532 \| -180.9136 \| -2.4499 \| -2.4314 \|
	\| 0.6098 \| 0.01 \| 90 \| 0.7191 \| 0.5345 \| -0.0458 \| 0.5577 \| 0.5803 \| -197.5928 \| -186.2494 \| -2.4888 \| -2.4677 \|
	\| 0.461 \| 0.01 \| 100 \| 0.6948 \| 1.0460 \| 0.6785 \| 0.5481 \| 0.3675 \| -190.3493 \| -181.1343 \| -2.4687 \| -2.4447 \|
	\| 1.0876 \| 0.01 \| 110 \| 0.7081 \| 1.0687 \| 0.9388 \| 0.5288 \| 0.1299 \| -187.7468 \| -180.9077 \| -2.4276 \| -2.4196 \|
	\| 0.5964 \| 0.01 \| 120 \| 0.7045 \| 0.9387 \| 0.7995 \| 0.5673 \| 0.1391 \| -189.1394 \| -182.2079 \| -2.4186 \| -2.4276 \|
	\| 0.6637 \| 0.01 \| 130 \| 0.7018 \| 0.9248 \| 0.7781 \| 0.5865 \| 0.1466 \| -189.3533 \| -182.3472 \| -2.4240 \| -2.4395 \|
	\| 0.5702 \| 0.01 \| 140 \| 0.6985 \| 0.8728 \| 0.7128 \| 0.6058 \| 0.1600 \| -190.0070 \| -182.8667 \| -2.4273 \| -2.4452 \|
	\| 0.8064 \| 0.01 \| 150 \| 0.6941 \| 0.8313 \| 0.6588 \| 0.6058 \| 0.1725 \| -190.5471 \| -183.2818 \| -2.4245 \| -2.4424 \|
	\| 0.7656 \| 0.01 \| 160 \| 0.6877 \| 0.7222 \| 0.5277 \| 0.5962 \| 0.1945 \| -191.8579 \| -184.3729 \| -2.4206 \| -2.4390 \|
	\| 0.6725 \| 0.01 \| 170 \| 0.6949 \| 0.8229 \| 0.6362 \| 0.5865 \| 0.1867 \| -190.7732 \| -183.3658 \| -2.4268 \| -2.4442 \|
	\| 0.6524 \| 0.01 \| 180 \| 0.7100 \| 0.9856 \| 0.8195 \| 0.5673 \| 0.1660 \| -188.9394 \| -181.7392 \| -2.4317 \| -2.4486 \|
	\| 1.0287 \| 0.02 \| 190 \| 0.7161 \| 1.0244 \| 0.8611 \| 0.5769 \| 0.1634 \| -188.5242 \| -181.3504 \| -2.4263 \| -2.4431 \|
	\| 0.8451 \| 0.02 \| 200 \| 0.7186 \| 1.0966 \| 0.9354 \| 0.5769 \| 0.1613 \| -187.7810 \| -180.6283 \| -2.4266 \| -2.4435 \|
	\| 0.6098 \| 0.02 \| 210 \| 0.7159 \| 1.1066 \| 0.9427 \| 0.5865 \| 0.1639 \| -187.7074 \| -180.5288 \| -2.4209 \| -2.4382 \|
	\| 0.5698 \| 0.02 \| 220 \| 0.7149 \| 1.1019 \| 0.9356 \| 0.5962 \| 0.1663 \| -187.7789 \| -180.5757 \| -2.4156 \| -2.4336 \|
	\| 0.7013 \| 0.02 \| 230 \| 0.7145 \| 1.0913 \| 0.9216 \| 0.5962 \| 0.1697 \| -187.9192 \| -180.6817 \| -2.4142 \| -2.4319 \|
	\| 2.6822 \| 0.02 \| 240 \| 0.7143 \| 1.0768 \| 0.9049 \| 0.5962 \| 0.1720 \| -188.0860 \| -180.8263 \| -2.4140 \| -2.4322 \|
	\| 0.9203 \| 0.02 \| 250 \| 0.7137 \| 1.0722 \| 0.8993 \| 0.5962 \| 0.1729 \| -188.1420 \| -180.8729 \| -2.4148 \| -2.4328 \|


	### Framework versions

	- Transformers 4.35.2
	- Pytorch 2.0.1+cu117
	- Datasets 2.15.0
	- Tokenizers 0.15.1