BlouseJury commited on
Commit
a82ef9d
1 Parent(s): cdd0b28

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +551 -0
README.md ADDED
@@ -0,0 +1,551 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ tags:
4
+ - generated_from_trainer
5
+ - axolotl
6
+ base_model: Qwen/Qwen2-72B
7
+ datasets:
8
+ - cognitivecomputations/Dolphin-2.9
9
+ - teknium/OpenHermes-2.5
10
+ - m-a-p/CodeFeedback-Filtered-Instruction
11
+ - cognitivecomputations/dolphin-coder
12
+ - cognitivecomputations/samantha-data
13
+ - microsoft/orca-math-word-problems-200k
14
+ - Locutusque/function-calling-chatml
15
+ - internlm/Agent-FLAN
16
+ license_name: tongyi-qianwen
17
+ license_link: https://huggingface.co/Qwen/Qwen1.5-110B/blob/main/LICENSE
18
+ model-index:
19
+ - name: dolphin-2.9.2-qwen2-72b
20
+ results:
21
+ - task:
22
+ type: text-generation
23
+ name: Text Generation
24
+ dataset:
25
+ name: IFEval (0-Shot)
26
+ type: HuggingFaceH4/ifeval
27
+ args:
28
+ num_few_shot: 0
29
+ metrics:
30
+ - type: inst_level_strict_acc and prompt_level_strict_acc
31
+ value: 40.38
32
+ name: strict accuracy
33
+ source:
34
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=cognitivecomputations/dolphin-2.9.2-qwen2-72b
35
+ name: Open LLM Leaderboard
36
+ - task:
37
+ type: text-generation
38
+ name: Text Generation
39
+ dataset:
40
+ name: BBH (3-Shot)
41
+ type: BBH
42
+ args:
43
+ num_few_shot: 3
44
+ metrics:
45
+ - type: acc_norm
46
+ value: 47.7
47
+ name: normalized accuracy
48
+ source:
49
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=cognitivecomputations/dolphin-2.9.2-qwen2-72b
50
+ name: Open LLM Leaderboard
51
+ - task:
52
+ type: text-generation
53
+ name: Text Generation
54
+ dataset:
55
+ name: MATH Lvl 5 (4-Shot)
56
+ type: hendrycks/competition_math
57
+ args:
58
+ num_few_shot: 4
59
+ metrics:
60
+ - type: exact_match
61
+ value: 21.37
62
+ name: exact match
63
+ source:
64
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=cognitivecomputations/dolphin-2.9.2-qwen2-72b
65
+ name: Open LLM Leaderboard
66
+ - task:
67
+ type: text-generation
68
+ name: Text Generation
69
+ dataset:
70
+ name: GPQA (0-shot)
71
+ type: Idavidrein/gpqa
72
+ args:
73
+ num_few_shot: 0
74
+ metrics:
75
+ - type: acc_norm
76
+ value: 16.0
77
+ name: acc_norm
78
+ source:
79
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=cognitivecomputations/dolphin-2.9.2-qwen2-72b
80
+ name: Open LLM Leaderboard
81
+ - task:
82
+ type: text-generation
83
+ name: Text Generation
84
+ dataset:
85
+ name: MuSR (0-shot)
86
+ type: TAUR-Lab/MuSR
87
+ args:
88
+ num_few_shot: 0
89
+ metrics:
90
+ - type: acc_norm
91
+ value: 17.04
92
+ name: acc_norm
93
+ source:
94
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=cognitivecomputations/dolphin-2.9.2-qwen2-72b
95
+ name: Open LLM Leaderboard
96
+ - task:
97
+ type: text-generation
98
+ name: Text Generation
99
+ dataset:
100
+ name: MMLU-PRO (5-shot)
101
+ type: TIGER-Lab/MMLU-Pro
102
+ config: main
103
+ split: test
104
+ args:
105
+ num_few_shot: 5
106
+ metrics:
107
+ - type: acc
108
+ value: 49.52
109
+ name: accuracy
110
+ source:
111
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=cognitivecomputations/dolphin-2.9.2-qwen2-72b
112
+ name: Open LLM Leaderboard
113
+ ---
114
+
115
+ # Dolphin 2.9.2 Qwen2 72B 🐬
116
+
117
+ Curated and trained by Eric Hartford, Lucas Atkins, and Fernando Fernandes, and Cognitive Computations
118
+
119
+ [![Discord](https://img.shields.io/discord/1156064224225808488?logo=Discord&logoColor=%23ffffff&label=Discord&link=https%3A%2F%2Fdiscord.gg%2FtCMkMDDHwm)](https://discord.gg/cognitivecomputations)
120
+ Discord: https://discord.gg/cognitivecomputations
121
+
122
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/ldkN1J0WIDQwU4vutGYiD.png" width="600" />
123
+
124
+ Our appreciation for the sponsors of Dolphin 2.9.2:
125
+ - [Crusoe Cloud](https://crusoe.ai/) - provided excellent on-demand 8xH100 node
126
+
127
+ This model is based on Qwen2-72b, and is governed by [tongyi-qianwen license](LICENSE)
128
+
129
+ The base model has 128k context, and the full-weight fine-tuning was with 8k sequence length.
130
+
131
+ This model was trained FFT on parameters selected by [Laser Scanner](https://github.com/cognitivecomputations/laserRMT/blob/main/laser_scanner.py), using ChatML prompt template format.
132
+
133
+ example:
134
+
135
+ ```
136
+ <|im_start|>system
137
+ You are Dolphin, a helpful AI assistant.<|im_end|>
138
+ <|im_start|>user
139
+ {prompt}<|im_end|>
140
+ <|im_start|>assistant
141
+
142
+ ```
143
+
144
+ Dolphin-2.9.2 has a variety of instruction, conversational, and coding skills. It also has initial agentic abilities and supports function calling.
145
+
146
+ Dolphin is uncensored. We have filtered the dataset to remove alignment and bias. This makes the model more compliant. You are advised to implement your own alignment layer before exposing the model as a service. It will be highly compliant with any requests, even unethical ones. Please read my blog post about uncensored models. https://erichartford.com/uncensored-models You are responsible for any content you create using this model. Enjoy responsibly.
147
+
148
+ Dolphin is licensed according to Qwen's tongyi-qianwen license. We grant permission for any use, including commercial, that falls within accordance with said license. Dolphin was trained on data generated from GPT4, among other models.
149
+
150
+ ## Evals
151
+
152
+ ![image/png](https://i.ibb.co/B4x1Ddr/file-2ao0fl-K2-B2-Hmka-Epd0ja-QY0x.webp)
153
+
154
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
155
+ <details><summary>See axolotl config</summary>
156
+
157
+ axolotl version: `0.4.0`
158
+ ```yaml
159
+ base_model: Qwen/Qwen2-72B
160
+ model_type: AutoModelForCausalLM
161
+ tokenizer_type: AutoTokenizer
162
+
163
+ trust_remote_code: true
164
+
165
+ # load_in_8bit: true
166
+ # load_in_4bit: false
167
+ # strict: false
168
+
169
+ datasets:
170
+ - path: /workspace/datasets/dolphin-2.9.2/dolphin201-sharegpt2.jsonl
171
+ type: sharegpt
172
+ conversation: chatml
173
+ - path: /workspace/datasets/dolphin-2.9.2/dolphin-coder-codegen-sharegpt2.jsonl
174
+ type: sharegpt
175
+ conversation: chatml
176
+ - path: /workspace/datasets/dolphin-2.9.2/dolphin-coder-translate-sharegpt2.jsonl
177
+ type: sharegpt
178
+ conversation: chatml
179
+ - path: /workspace/datasets/dolphin-2.9.2/m-a-p_Code-Feedback-sharegpt-unfiltered.jsonl
180
+ type: sharegpt
181
+ conversation: chatml
182
+ - path: /workspace/datasets/dolphin-2.9.2/m-a-p_CodeFeedback-Filtered-Instruction-sharegpt-unfiltered.jsonl
183
+ type: sharegpt
184
+ conversation: chatml
185
+ - path: /workspace/datasets/dolphin-2.9.2/not_samantha_norefusals.jsonl
186
+ type: sharegpt
187
+ conversation: chatml
188
+ - path: /workspace/datasets/dolphin-2.9.2/openhermes200k_unfiltered.jsonl
189
+ type: sharegpt
190
+ conversation: chatml
191
+ - path: /workspace/datasets/dolphin-2.9.2/Orca-Math-resort-unfiltered.jsonl
192
+ type: sharegpt
193
+ conversation: chatml
194
+ - path: /workspace/datasets/dolphin-2.9.2/SystemChat_sharegpt.jsonl
195
+ type: sharegpt
196
+ conversation: chatml
197
+ - path: /workspace/datasets/dolphin-2.9.2/toolbench_instruct_j1s1_3k_unfiltered.jsonl
198
+ type: sharegpt
199
+ conversation: chatml
200
+ - path: /workspace/datasets/dolphin-2.9.2/toolbench_negative_unfiltered.jsonl
201
+ type: sharegpt
202
+ conversation: chatml
203
+ - path: /workspace/datasets/dolphin-2.9.2/toolbench_react_10p_unfiltered.jsonl
204
+ type: sharegpt
205
+ conversation: chatml
206
+ - path: /workspace/datasets/dolphin-2.9.2/toolbench_tflan_cot_30p_unfiltered.jsonl
207
+ type: sharegpt
208
+ conversation: chatml
209
+ - path: /workspace/datasets/dolphin-2.9.2/agent_instruct_react_unfiltered.jsonl
210
+ type: sharegpt
211
+ conversation: chatml
212
+
213
+ unfrozen_parameters:
214
+ - ^lm_head.weight$
215
+ - ^model.embed_tokens.weight$
216
+ # mlp.down_proj layers
217
+ - model.layers.62.mlp.down_proj
218
+ - model.layers.63.mlp.down_proj
219
+ - model.layers.66.mlp.down_proj
220
+ - model.layers.65.mlp.down_proj
221
+ - model.layers.64.mlp.down_proj
222
+ - model.layers.67.mlp.down_proj
223
+ - model.layers.68.mlp.down_proj
224
+ - model.layers.60.mlp.down_proj
225
+ - model.layers.31.mlp.down_proj
226
+ - model.layers.69.mlp.down_proj
227
+ - model.layers.61.mlp.down_proj
228
+ - model.layers.59.mlp.down_proj
229
+ - model.layers.70.mlp.down_proj
230
+ - model.layers.30.mlp.down_proj
231
+ - model.layers.76.mlp.down_proj
232
+ - model.layers.72.mlp.down_proj
233
+ - model.layers.77.mlp.down_proj
234
+ - model.layers.71.mlp.down_proj
235
+ - model.layers.29.mlp.down_proj
236
+ - model.layers.58.mlp.down_proj
237
+ - model.layers.75.mlp.down_proj
238
+ - model.layers.32.mlp.down_proj
239
+ - model.layers.56.mlp.down_proj
240
+ - model.layers.28.mlp.down_proj
241
+ - model.layers.26.mlp.down_proj
242
+ - model.layers.33.mlp.down_proj
243
+ - model.layers.34.mlp.down_proj
244
+ - model.layers.57.mlp.down_proj
245
+ - model.layers.27.mlp.down_proj
246
+ - model.layers.25.mlp.down_proj
247
+ - model.layers.35.mlp.down_proj
248
+ - model.layers.73.mlp.down_proj
249
+ - model.layers.24.mlp.down_proj
250
+ - model.layers.78.mlp.down_proj
251
+ - model.layers.74.mlp.down_proj
252
+ - model.layers.54.mlp.down_proj
253
+ # mlp.gate_proj layers
254
+ - model.layers.78.mlp.gate_proj
255
+ - model.layers.77.mlp.gate_proj
256
+ - model.layers.76.mlp.gate_proj
257
+ - model.layers.79.mlp.gate_proj
258
+ - model.layers.75.mlp.gate_proj
259
+ - model.layers.74.mlp.gate_proj
260
+ - model.layers.73.mlp.gate_proj
261
+ - model.layers.70.mlp.gate_proj
262
+ - model.layers.72.mlp.gate_proj
263
+ - model.layers.71.mlp.gate_proj
264
+ - model.layers.69.mlp.gate_proj
265
+ - model.layers.54.mlp.gate_proj
266
+ - model.layers.68.mlp.gate_proj
267
+ - model.layers.57.mlp.gate_proj
268
+ - model.layers.63.mlp.gate_proj
269
+ - model.layers.49.mlp.gate_proj
270
+ - model.layers.55.mlp.gate_proj
271
+ - model.layers.53.mlp.gate_proj
272
+ - model.layers.44.mlp.gate_proj
273
+ - model.layers.46.mlp.gate_proj
274
+ - model.layers.67.mlp.gate_proj
275
+ - model.layers.58.mlp.gate_proj
276
+ - model.layers.56.mlp.gate_proj
277
+ - model.layers.45.mlp.gate_proj
278
+ - model.layers.50.mlp.gate_proj
279
+ - model.layers.62.mlp.gate_proj
280
+ - model.layers.64.mlp.gate_proj
281
+ - model.layers.48.mlp.gate_proj
282
+ - model.layers.66.mlp.gate_proj
283
+ - model.layers.52.mlp.gate_proj
284
+ - model.layers.40.mlp.gate_proj
285
+ - model.layers.47.mlp.gate_proj
286
+ - model.layers.43.mlp.gate_proj
287
+ - model.layers.65.mlp.gate_proj
288
+ - model.layers.61.mlp.gate_proj
289
+ - model.layers.59.mlp.gate_proj
290
+ # mlp.up_proj layers
291
+ - model.layers.69.mlp.up_proj
292
+ - model.layers.70.mlp.up_proj
293
+ - model.layers.71.mlp.up_proj
294
+ - model.layers.68.mlp.up_proj
295
+ - model.layers.67.mlp.up_proj
296
+ - model.layers.66.mlp.up_proj
297
+ - model.layers.46.mlp.up_proj
298
+ - model.layers.63.mlp.up_proj
299
+ - model.layers.72.mlp.up_proj
300
+ - model.layers.64.mlp.up_proj
301
+ - model.layers.62.mlp.up_proj
302
+ - model.layers.45.mlp.up_proj
303
+ - model.layers.65.mlp.up_proj
304
+ - model.layers.73.mlp.up_proj
305
+ - model.layers.47.mlp.up_proj
306
+ - model.layers.44.mlp.up_proj
307
+ - model.layers.49.mlp.up_proj
308
+ - model.layers.48.mlp.up_proj
309
+ - model.layers.53.mlp.up_proj
310
+ - model.layers.74.mlp.up_proj
311
+ - model.layers.75.mlp.up_proj
312
+ - model.layers.57.mlp.up_proj
313
+ - model.layers.76.mlp.up_proj
314
+ - model.layers.43.mlp.up_proj
315
+ - model.layers.42.mlp.up_proj
316
+ - model.layers.61.mlp.up_proj
317
+ - model.layers.40.mlp.up_proj
318
+ - model.layers.56.mlp.up_proj
319
+ - model.layers.60.mlp.up_proj
320
+ - model.layers.31.mlp.up_proj
321
+ - model.layers.54.mlp.up_proj
322
+ - model.layers.55.mlp.up_proj
323
+ - model.layers.32.mlp.up_proj
324
+ - model.layers.41.mlp.up_proj
325
+ - model.layers.33.mlp.up_proj
326
+ - model.layers.58.mlp.up_proj
327
+ # self_attn.k_proj layers
328
+ - model.layers.79.self_attn.k_proj
329
+ - model.layers.36.self_attn.k_proj
330
+ - model.layers.35.self_attn.k_proj
331
+ - model.layers.74.self_attn.k_proj
332
+ - model.layers.34.self_attn.k_proj
333
+ - model.layers.78.self_attn.k_proj
334
+ - model.layers.77.self_attn.k_proj
335
+ - model.layers.37.self_attn.k_proj
336
+ - model.layers.39.self_attn.k_proj
337
+ - model.layers.41.self_attn.k_proj
338
+ - model.layers.38.self_attn.k_proj
339
+ - model.layers.33.self_attn.k_proj
340
+ - model.layers.69.self_attn.k_proj
341
+ - model.layers.42.self_attn.k_proj
342
+ - model.layers.32.self_attn.k_proj
343
+ - model.layers.25.self_attn.k_proj
344
+ - model.layers.70.self_attn.k_proj
345
+ - model.layers.22.self_attn.k_proj
346
+ - model.layers.63.self_attn.k_proj
347
+ - model.layers.29.self_attn.k_proj
348
+ - model.layers.68.self_attn.k_proj
349
+ - model.layers.24.self_attn.k_proj
350
+ - model.layers.30.self_attn.k_proj
351
+ - model.layers.66.self_attn.k_proj
352
+ - model.layers.31.self_attn.k_proj
353
+ - model.layers.23.self_attn.k_proj
354
+ - model.layers.65.self_attn.k_proj
355
+ - model.layers.57.self_attn.k_proj
356
+ - model.layers.28.self_attn.k_proj
357
+ - model.layers.64.self_attn.k_proj
358
+ - model.layers.44.self_attn.k_proj
359
+ - model.layers.27.self_attn.k_proj
360
+ - model.layers.75.self_attn.k_proj
361
+ - model.layers.40.self_attn.k_proj
362
+ - model.layers.26.self_attn.k_proj
363
+ - model.layers.61.self_attn.k_proj
364
+ # self_attn.o_proj layers
365
+ - model.layers.14.self_attn.o_proj
366
+ - model.layers.39.self_attn.o_proj
367
+ - model.layers.19.self_attn.o_proj
368
+ - model.layers.16.self_attn.o_proj
369
+ - model.layers.17.self_attn.o_proj
370
+ - model.layers.15.self_attn.o_proj
371
+ - model.layers.69.self_attn.o_proj
372
+ - model.layers.12.self_attn.o_proj
373
+ - model.layers.42.self_attn.o_proj
374
+ - model.layers.23.self_attn.o_proj
375
+ - model.layers.22.self_attn.o_proj
376
+ - model.layers.29.self_attn.o_proj
377
+ - model.layers.13.self_attn.o_proj
378
+ - model.layers.46.self_attn.o_proj
379
+ - model.layers.52.self_attn.o_proj
380
+ - model.layers.26.self_attn.o_proj
381
+ - model.layers.38.self_attn.o_proj
382
+ - model.layers.41.self_attn.o_proj
383
+ - model.layers.18.self_attn.o_proj
384
+ - model.layers.49.self_attn.o_proj
385
+ - model.layers.11.self_attn.o_proj
386
+ - model.layers.28.self_attn.o_proj
387
+ - model.layers.25.self_attn.o_proj
388
+ - model.layers.47.self_attn.o_proj
389
+ - model.layers.53.self_attn.o_proj
390
+ - model.layers.27.self_attn.o_proj
391
+ - model.layers.37.self_attn.o_proj
392
+ - model.layers.20.self_attn.o_proj
393
+ - model.layers.43.self_attn.o_proj
394
+ - model.layers.44.self_attn.o_proj
395
+ - model.layers.45.self_attn.o_proj
396
+ - model.layers.30.self_attn.o_proj
397
+ - model.layers.24.self_attn.o_proj
398
+ - model.layers.21.self_attn.o_proj
399
+ - model.layers.10.self_attn.o_proj
400
+ - model.layers.3.self_attn.o_proj
401
+ # self_attn.q_proj layers
402
+ - model.layers.1.self_attn.q_proj
403
+ - model.layers.2.self_attn.q_proj
404
+ - model.layers.3.self_attn.q_proj
405
+ - model.layers.5.self_attn.q_proj
406
+ - model.layers.4.self_attn.q_proj
407
+ - model.layers.0.self_attn.q_proj
408
+ - model.layers.6.self_attn.q_proj
409
+ - model.layers.8.self_attn.q_proj
410
+ - model.layers.7.self_attn.q_proj
411
+ - model.layers.9.self_attn.q_proj
412
+ - model.layers.10.self_attn.q_proj
413
+ - model.layers.12.self_attn.q_proj
414
+ - model.layers.19.self_attn.q_proj
415
+ - model.layers.18.self_attn.q_proj
416
+ - model.layers.25.self_attn.q_proj
417
+ - model.layers.11.self_attn.q_proj
418
+ - model.layers.15.self_attn.q_proj
419
+ - model.layers.61.self_attn.q_proj
420
+ - model.layers.17.self_attn.q_proj
421
+ - model.layers.55.self_attn.q_proj
422
+ - model.layers.54.self_attn.q_proj
423
+ - model.layers.16.self_attn.q_proj
424
+ - model.layers.68.self_attn.q_proj
425
+ - model.layers.49.self_attn.q_proj
426
+ - model.layers.48.self_attn.q_proj
427
+ - model.layers.52.self_attn.q_proj
428
+ - model.layers.13.self_attn.q_proj
429
+ - model.layers.42.self_attn.q_proj
430
+ - model.layers.57.self_attn.q_proj
431
+ - model.layers.60.self_attn.q_proj
432
+ - model.layers.53.self_attn.q_proj
433
+ - model.layers.64.self_attn.q_proj
434
+ - model.layers.66.self_attn.q_proj
435
+ - model.layers.62.self_attn.q_proj
436
+ - model.layers.59.self_attn.q_proj
437
+ - model.layers.50.self_attn.q_proj
438
+ # self_attn.v_proj layers
439
+ - model.layers.15.self_attn.v_proj
440
+ - model.layers.16.self_attn.v_proj
441
+ - model.layers.23.self_attn.v_proj
442
+ - model.layers.24.self_attn.v_proj
443
+ - model.layers.25.self_attn.v_proj
444
+ - model.layers.26.self_attn.v_proj
445
+ - model.layers.27.self_attn.v_proj
446
+ - model.layers.28.self_attn.v_proj
447
+ - model.layers.29.self_attn.v_proj
448
+ - model.layers.30.self_attn.v_proj
449
+ - model.layers.31.self_attn.v_proj
450
+ - model.layers.32.self_attn.v_proj
451
+ - model.layers.33.self_attn.v_proj
452
+ - model.layers.34.self_attn.v_proj
453
+ - model.layers.35.self_attn.v_proj
454
+ - model.layers.36.self_attn.v_proj
455
+ - model.layers.37.self_attn.v_proj
456
+ - model.layers.38.self_attn.v_proj
457
+ - model.layers.39.self_attn.v_proj
458
+ - model.layers.41.self_attn.v_proj
459
+ - model.layers.42.self_attn.v_proj
460
+ - model.layers.48.self_attn.v_proj
461
+ - model.layers.53.self_attn.v_proj
462
+ - model.layers.57.self_attn.v_proj
463
+ - model.layers.58.self_attn.v_proj
464
+ - model.layers.59.self_attn.v_proj
465
+ - model.layers.61.self_attn.v_proj
466
+ - model.layers.63.self_attn.v_proj
467
+ - model.layers.64.self_attn.v_proj
468
+ - model.layers.65.self_attn.v_proj
469
+ - model.layers.66.self_attn.v_proj
470
+ - model.layers.69.self_attn.v_proj
471
+ - model.layers.74.self_attn.v_proj
472
+ - model.layers.75.self_attn.v_proj
473
+ - model.layers.76.self_attn.v_proj
474
+ - model.layers.72.self_attn.v_proj
475
+
476
+
477
+ chat_template: chatml
478
+ dataset_prepared_path: qwen2-72b-data
479
+ val_set_size: 0.01
480
+ output_dir: qwen2-72b
481
+
482
+ sequence_len: 8192 # supports up to 8192
483
+ sample_packing: true
484
+ pad_to_sequence_len: true
485
+
486
+ # adapter: lora
487
+ # lora_model_dir:
488
+ # lora_r: 32
489
+ # lora_alpha: 16
490
+ # lora_dropout: 0.05
491
+ # lora_target_linear: true
492
+ # lora_fan_in_fan_out:
493
+
494
+ wandb_project: qwen2-72b
495
+ wandb_entity:
496
+ wandb_watch:
497
+ wandb_name:
498
+ wandb_log_model:
499
+
500
+ gradient_accumulation_steps: 8
501
+ micro_batch_size: 1
502
+ num_epochs: 3
503
+ optimizer: paged_adamw_8bit
504
+ lr_scheduler: cosine
505
+ learning_rate: 1e-5
506
+
507
+ train_on_inputs: false
508
+ group_by_length: false
509
+ bf16: auto
510
+ fp16:
511
+ tf32: false
512
+
513
+ gradient_checkpointing: true
514
+ early_stopping_patience:
515
+ resume_from_checkpoint:
516
+ local_rank:
517
+ logging_steps: 1
518
+ xformers_attention:
519
+ flash_attention: true
520
+
521
+ warmup_steps: 10
522
+ evals_per_epoch: 2
523
+ eval_table_size:
524
+ eval_max_new_tokens: 128
525
+ saves_per_epoch: 4
526
+ save_total_limit: 2
527
+ debug:
528
+ deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16_cpuoffload_params.json
529
+ weight_decay: 0.05
530
+ fsdp:
531
+ fsdp_config:
532
+ special_tokens:
533
+ pad_token: "<|endoftext|>"
534
+ eos_token: "<|im_end|>"
535
+
536
+ ```
537
+
538
+
539
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
540
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_cognitivecomputations__dolphin-2.9.2-qwen2-72b)
541
+
542
+ | Metric |Value|
543
+ |-------------------|----:|
544
+ |Avg. |32.00|
545
+ |IFEval (0-Shot) |40.38|
546
+ |BBH (3-Shot) |47.70|
547
+ |MATH Lvl 5 (4-Shot)|21.37|
548
+ |GPQA (0-shot) |16.00|
549
+ |MuSR (0-shot) |17.04|
550
+ |MMLU-PRO (5-shot) |49.52|
551
+