ZoninSh commited on
Commit
bdb8592
1 Parent(s): d3772cf

ZoninSh/openhermes-mistral-dpo-gpt

Browse files
README.md CHANGED
@@ -15,15 +15,15 @@ should probably proofread and complete it, then remove this comment. -->
15
 
16
  This model is a fine-tuned version of [TheBloke/OpenHermes-2-Mistral-7B-GPTQ](https://huggingface.co/TheBloke/OpenHermes-2-Mistral-7B-GPTQ) on the None dataset.
17
  It achieves the following results on the evaluation set:
18
- - Loss: 0.6047
19
- - Rewards/chosen: -0.0917
20
- - Rewards/rejected: -4.9453
21
- - Rewards/accuracies: 0.5625
22
- - Rewards/margins: 4.8536
23
- - Logps/rejected: -466.4970
24
- - Logps/chosen: -173.4612
25
- - Logits/rejected: -1.5579
26
- - Logits/chosen: -1.5934
27
 
28
  ## Model description
29
 
@@ -49,18 +49,19 @@ The following hyperparameters were used during training:
49
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
50
  - lr_scheduler_type: linear
51
  - lr_scheduler_warmup_steps: 2
52
- - training_steps: 50
53
  - mixed_precision_training: Native AMP
54
 
55
  ### Training results
56
 
57
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
58
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
59
- | 0.8027 | 0.01 | 10 | 0.8159 | 0.0214 | 0.4647 | 0.1875 | -0.4432 | -412.3975 | -172.3304 | -1.6414 | -1.6412 |
60
- | 1.1614 | 0.01 | 20 | 0.9902 | 0.1251 | 1.0701 | 0.3125 | -0.9450 | -406.3427 | -171.2931 | -1.6348 | -1.6337 |
61
- | 0.7986 | 0.01 | 30 | 0.6237 | 0.0767 | -0.6890 | 0.5625 | 0.7657 | -423.9343 | -171.7779 | -1.6135 | -1.6290 |
62
- | 7.3338 | 0.02 | 40 | 0.6306 | -0.1257 | -4.7694 | 0.5625 | 4.6437 | -464.7380 | -173.8018 | -1.5634 | -1.5933 |
63
- | 2.144 | 0.03 | 50 | 0.6047 | -0.0917 | -4.9453 | 0.5625 | 4.8536 | -466.4970 | -173.4612 | -1.5579 | -1.5934 |
 
64
 
65
 
66
  ### Framework versions
 
15
 
16
  This model is a fine-tuned version of [TheBloke/OpenHermes-2-Mistral-7B-GPTQ](https://huggingface.co/TheBloke/OpenHermes-2-Mistral-7B-GPTQ) on the None dataset.
17
  It achieves the following results on the evaluation set:
18
+ - Loss: 3.2500
19
+ - Rewards/chosen: -1.0975
20
+ - Rewards/rejected: -1.6306
21
+ - Rewards/accuracies: 0.625
22
+ - Rewards/margins: 0.5331
23
+ - Logps/rejected: -307.3866
24
+ - Logps/chosen: -331.8629
25
+ - Logits/rejected: -2.4077
26
+ - Logits/chosen: -2.3038
27
 
28
  ## Model description
29
 
 
49
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
50
  - lr_scheduler_type: linear
51
  - lr_scheduler_warmup_steps: 2
52
+ - training_steps: 300
53
  - mixed_precision_training: Native AMP
54
 
55
  ### Training results
56
 
57
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
58
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
59
+ | 4.2921 | 0.03 | 50 | 9.8028 | -5.3862 | 0.1060 | 0.1875 | -5.4922 | -290.0201 | -374.7499 | -2.2861 | -2.1795 |
60
+ | 9.75 | 0.05 | 100 | 8.8191 | -12.7493 | -8.6505 | 0.3125 | -4.0989 | -377.5849 | -448.3811 | -2.2836 | -2.2309 |
61
+ | 3.2104 | 0.07 | 150 | 0.8915 | -3.5710 | -6.0350 | 0.375 | 2.4640 | -351.4305 | -356.5982 | -2.6543 | -2.5955 |
62
+ | 2.655 | 0.1 | 200 | 0.3207 | -1.0209 | -4.6027 | 0.6875 | 3.5818 | -337.1074 | -331.0971 | -2.4341 | -2.3534 |
63
+ | 4.8481 | 0.12 | 250 | 1.1311 | -0.8147 | -2.3072 | 0.625 | 1.4926 | -314.1525 | -329.0346 | -2.3257 | -2.2374 |
64
+ | 3.1598 | 0.15 | 300 | 3.2500 | -1.0975 | -1.6306 | 0.625 | 0.5331 | -307.3866 | -331.8629 | -2.4077 | -2.3038 |
65
 
66
 
67
  ### Framework versions
adapter_config.json CHANGED
@@ -17,7 +17,12 @@
17
  "revision": null,
18
  "target_modules": [
19
  "q_proj",
20
- "v_proj"
 
 
 
 
 
21
  ],
22
  "task_type": "CAUSAL_LM"
23
  }
 
17
  "revision": null,
18
  "target_modules": [
19
  "q_proj",
20
+ "gate_proj",
21
+ "v_proj",
22
+ "down_proj",
23
+ "k_proj",
24
+ "o_proj",
25
+ "up_proj"
26
  ],
27
  "task_type": "CAUSAL_LM"
28
  }
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:27c863c0cb0c4e370c8bed63b8b7d4730fcaee7186769db06dde9ffd849b2dc8
3
- size 13648432
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:86a7fed3a974dbcdb88d1e4ada81ee85076e9ff04a6f8e08dcf960ee07b71633
3
+ size 83945296
runs/Nov18_15-32-12_056656b68ebe/events.out.tfevents.1700321570.056656b68ebe.176.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7a7484c8de187593571c24ff14b4bf5479a5f7d6f885205fbd48e0a9c5d5296c
3
+ size 14047
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d31571207586ff667fb56d4455c593cb794147063948e1b4dbb49b3da24bf221
3
  size 4155
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:695023550747e11566513b0f3522c6ad468e4815fb86520c9add707c8f099aad
3
  size 4155