RAIJAY
/

NBERT_DPO

Generated from Trainer

Model card Files Files and versions Community

NBERT_DPO / README.md

RAIJAY's picture

RAIJAY/NBERT_2

29abc24 verified 8 months ago

|

history blame contribute delete

No virus

3.65 kB

	---
	license: apache-2.0
	library_name: peft
	tags:
	- trl
	- dpo
	- generated_from_trainer
	base_model: mistralai/Mistral-7B-Instruct-v0.2
	model-index:
	- name: DPO_TEST_1
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# DPO_TEST_1

	This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0229
	- Rewards/chosen: -740.3076
	- Rewards/rejected: -1059.6395
	- Rewards/accuracies: 0.9988
	- Rewards/margins: 319.3320
	- Logps/rejected: -10817.7158
	- Logps/chosen: -7838.3896
	- Logits/rejected: -32.6170
	- Logits/chosen: -26.3151

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 2
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 4
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 2
	- training_steps: 25806

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 37.7926 \| 0.67 \| 2867 \| 2.8916 \| -149.7567 \| -197.0506 \| 0.9359 \| 47.2939 \| -2191.8269 \| -1932.8804 \| 38.0669 \| 4.1036 \|
	\| 84.7229 \| 1.33 \| 5734 \| 2.0247 \| -327.4202 \| -450.9706 \| 0.9656 \| 123.5504 \| -4731.0264 \| -3709.5146 \| -17.6232 \| -16.2614 \|
	\| 0.4302 \| 2.0 \| 8601 \| 0.2490 \| -391.4300 \| -536.8747 \| 0.9923 \| 145.4447 \| -5590.0679 \| -4349.6123 \| -13.6537 \| -12.7337 \|
	\| 0.6952 \| 2.67 \| 11468 \| 0.0587 \| -606.4489 \| -775.4740 \| 0.9970 \| 169.0251 \| -7976.0605 \| -6499.8027 \| 8.1646 \| -0.2018 \|
	\| 0.2119 \| 3.33 \| 14335 \| 0.2843 \| -641.6364 \| -925.0908 \| 0.9907 \| 283.4543 \| -9472.2285 \| -6851.6772 \| -11.2088 \| -13.0496 \|
	\| 0.129 \| 4.0 \| 17202 \| 0.1065 \| -706.7910 \| -1019.4420 \| 0.9958 \| 312.6511 \| -10415.7412 \| -7503.2227 \| 29.4650 \| 10.0032 \|
	\| 0.1046 \| 4.67 \| 20069 \| 0.1005 \| -758.2514 \| -1105.3041 \| 0.9977 \| 347.0525 \| -11274.3594 \| -8017.8281 \| -37.3526 \| -28.3912 \|
	\| 0.0656 \| 5.33 \| 22936 \| 0.0241 \| -790.2775 \| -1078.3324 \| 0.9986 \| 288.0548 \| -11004.6445 \| -8338.0889 \| -7.1017 \| -13.6854 \|
	\| 0.0 \| 6.0 \| 25803 \| 0.0229 \| -740.3076 \| -1059.6395 \| 0.9988 \| 319.3320 \| -10817.7158 \| -7838.3896 \| -32.6170 \| -26.3151 \|


	### Framework versions

	- PEFT 0.7.1
	- Transformers 4.36.2
	- Pytorch 2.0.1
	- Datasets 2.16.1
	- Tokenizers 0.15.0