zephyr-dpop-qlora-uf-oursuf6k-5e-6

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the generation/UF6k dataset. It achieves the following results on the evaluation set:

Loss: 1.0972
Positive Losses: 4.6136
Dpo Losses: 0.6513
Rewards/chosen: 0.1036
Rewards/rejected: -0.0071
Rewards/accuracies: 0.6270
Rewards/margins: 0.1107
Rewards/margins Max: 0.4366
Rewards/margins Min: -0.1898
Rewards/margins Std: 0.2797
Logps/rejected: -259.8883
Logps/chosen: -274.8616
Logits/rejected: -2.7883
Logits/chosen: -2.8318

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 16
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Positive Losses	Dpo Losses	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Rewards/margins Max	Rewards/margins Min	Rewards/margins Std	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6612	0.15	100	0.7167	0.3423	0.6805	0.0814	0.0535	0.5913	0.0279	0.1341	-0.0698	0.0908	-253.8317	-277.0839	-2.8097	-2.8549
0.6652	0.29	200	0.7424	0.6158	0.6684	0.1444	0.0868	0.6071	0.0576	0.2387	-0.1086	0.1564	-250.5054	-270.7805	-2.7612	-2.8047
0.6493	0.44	300	0.7586	0.7357	0.6609	0.1701	0.0950	0.6151	0.0751	0.2766	-0.1065	0.1714	-249.6817	-268.2108	-2.7656	-2.8106
0.6224	0.58	400	0.9943	3.3747	0.6529	0.1119	0.0109	0.6389	0.1009	0.3836	-0.1621	0.2434	-258.0921	-274.0359	-2.7767	-2.8199
0.5674	0.73	500	1.1831	5.7365	0.6565	0.0641	-0.0334	0.6270	0.0975	0.4143	-0.1884	0.2702	-262.5242	-278.8098	-2.7934	-2.8376
0.5749	0.88	600	1.0992	4.5979	0.6512	0.1035	-0.0073	0.6190	0.1109	0.4368	-0.1884	0.2790	-259.9164	-274.8698	-2.7839	-2.8279

Framework versions

PEFT 0.7.1
Transformers 4.39.0.dev0
Pytorch 2.1.2+cu121
Datasets 2.14.6
Tokenizers 0.15.2

just1nseo
/

zephyr-dpop-qlora-uf-oursuf6k-5e-6

zephyr-dpop-qlora-uf-oursuf6k-5e-6

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for just1nseo/zephyr-dpop-qlora-uf-oursuf6k-5e-6

Evaluation results