zephyr-dpop-qlora-gpt4-5e-6-epoch3

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the generation/GPT4 dataset. It achieves the following results on the evaluation set:

Loss: 14.3852
Positive Losses: 141.6597
Dpo Losses: 0.6849
Rewards/chosen: -1.4061
Rewards/rejected: -2.0012
Rewards/accuracies: 0.6667
Rewards/margins: 0.5951
Rewards/margins Max: 2.2885
Rewards/margins Min: -1.0995
Rewards/margins Std: 1.4978
Logps/rejected: -459.3069
Logps/chosen: -425.8328
Logits/rejected: -2.2783
Logits/chosen: -2.3207

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 16
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Positive Losses	Dpo Losses	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Rewards/margins Max	Rewards/margins Min	Rewards/margins Std	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.5432	0.28	100	1.5490	8.3683	0.6723	-0.0507	-0.1015	0.5992	0.0508	0.2567	-0.1414	0.1757	-269.3354	-290.2917	-2.6677	-2.7099
0.4843	0.56	200	3.6354	28.9322	0.6415	-0.2537	-0.4297	0.6349	0.1759	0.7364	-0.3533	0.4858	-302.1486	-310.5943	-2.5589	-2.6000
0.2828	0.85	300	6.8046	61.7689	0.6346	-0.6003	-0.8503	0.6508	0.2500	1.0085	-0.4868	0.6679	-344.2117	-345.2526	-2.5349	-2.5759
0.3355	1.13	400	11.4158	108.7399	0.6572	-1.0761	-1.4209	0.6548	0.3447	1.4626	-0.7661	0.9968	-401.2702	-392.8341	-2.3773	-2.4155
0.3438	1.41	500	10.6413	101.3525	0.6381	-1.0007	-1.3406	0.6865	0.3399	1.3353	-0.6338	0.8805	-393.2457	-385.2938	-2.4471	-2.4907
0.2144	1.69	600	8.5896	79.7998	0.6267	-0.7817	-1.2135	0.6865	0.4318	1.5951	-0.6661	1.0047	-380.5318	-363.3914	-2.3029	-2.3438
0.3314	1.97	700	11.1651	107.2969	0.6525	-1.0595	-1.5150	0.6627	0.4555	1.7776	-0.8450	1.1660	-410.6869	-391.1705	-2.3025	-2.3432
0.1352	2.25	800	13.3571	130.9070	0.6700	-1.2986	-1.8184	0.6627	0.5198	2.0225	-0.9603	1.3296	-441.0237	-415.0786	-2.2901	-2.3320
0.2348	2.54	900	14.7241	145.9081	0.6904	-1.4488	-2.0053	0.6706	0.5564	2.1801	-1.0958	1.4586	-459.7108	-430.1044	-2.2661	-2.3085
0.1369	2.82	1000	14.5955	143.9389	0.6869	-1.4291	-2.0251	0.6627	0.5959	2.2953	-1.1073	1.5052	-461.6887	-428.1342	-2.2738	-2.3165

Framework versions

PEFT 0.7.1
Transformers 4.39.0.dev0
Pytorch 2.1.2+cu121
Datasets 2.14.6
Tokenizers 0.15.2

just1nseo
/

zephyr-dpop-qlora-gpt4-5e-6-epoch3

zephyr-dpop-qlora-gpt4-5e-6-epoch3

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for just1nseo/zephyr-dpop-qlora-gpt4-5e-6-epoch3

Evaluation results