Transformers
GGUF
alignment-handbook
trl
dpo
Generated from Trainer
Inference Endpoints
aashish1904 commited on
Commit
7411180
1 Parent(s): 716b332

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +298 -0
README.md ADDED
@@ -0,0 +1,298 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ library_name: transformers
5
+ license: llama3.1
6
+ base_model: Magpie-Align/MagpieLM-8B-SFT-v0.1
7
+ tags:
8
+ - alignment-handbook
9
+ - trl
10
+ - dpo
11
+ - generated_from_trainer
12
+ datasets:
13
+ - Magpie-Align/MagpieLM-SFT-Data-v0.1
14
+ - Magpie-Align/MagpieLM-DPO-Data-v0.1
15
+ model-index:
16
+ - name: MagpieLM-8B-Chat-v0.1
17
+ results: []
18
+
19
+ ---
20
+
21
+ [![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
22
+
23
+
24
+ # QuantFactory/MagpieLM-8B-Chat-v0.1-GGUF
25
+ This is quantized version of [Magpie-Align/MagpieLM-8B-Chat-v0.1](https://huggingface.co/Magpie-Align/MagpieLM-8B-Chat-v0.1) created using llama.cpp
26
+
27
+ # Original Model Card
28
+
29
+
30
+ ![Magpie](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/FWWILXrAGNwWr52aghV0S.png)
31
+
32
+ # 🐦 MagpieLM-8B-Chat-v0.1
33
+
34
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://api.wandb.ai/links/uw-nsl/0s1eegy2)
35
+
36
+ ## 🧐 About This Model
37
+
38
+ *Model full name: Llama3.1-MagpieLM-8B-Chat-v0.1*
39
+
40
+ This model is an aligned version of [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B), which achieves state-of-the-art performance among open-aligned SLMs. It even outperforms larger open-weight models including Llama-3-8B-Instruct, Llama-3.1-8B-Instruct, Qwen-2-7B-Instruct, and Gemma-2-9B-it.
41
+
42
+ We apply the following standard alignment pipeline with two carefully crafted synthetic datasets.
43
+
44
+ We first perform SFT using [Magpie-Align/MagpieLM-SFT-Data-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-SFT-Data-v0.1).
45
+ * **SFT Model Checkpoint:** [Magpie-Align/MagpieLM-8B-SFT-v0.1](https://huggingface.co/Magpie-Align/MagpieLM-8B-SFT-v0.1)
46
+
47
+ We then perform DPO on the [Magpie-Align/MagpieLM-DPO-Data-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-DPO-Data-v0.1) dataset.
48
+
49
+ ## 🔥 Benchmark Performance
50
+
51
+ Greedy Decoding
52
+
53
+ - **Alpaca Eval 2: 58.18 (LC), 62.38 (WR)**
54
+ - **Arena Hard: 48.4**
55
+ - **WildBench WB Score (v2.0625): 44.72**
56
+
57
+ **Benchmark Performance Compare to Other SOTA SLMs**
58
+
59
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/q1Rasy66h6lmaUP1KQ407.jpeg)
60
+
61
+ ## 👀 Other Information
62
+
63
+ **License**: Please follow [Meta Llama 3.1 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE).
64
+
65
+ **Conversation Template**: Please use the Llama 3 chat template for the best performance.
66
+
67
+ **Limitations**: This model primarily understands and generates content in English. Its outputs may contain factual errors, logical inconsistencies, or reflect biases present in the training data. While the model aims to improve instruction-following and helpfulness, it isn't specifically designed for complex reasoning tasks, potentially leading to suboptimal performance in these areas. Additionally, the model may produce unsafe or inappropriate content, as no specific safety training were implemented during the alignment process.
68
+
69
+ ## 🧐 How to use it?
70
+
71
+ [![Spaces](https://img.shields.io/badge/🤗-Open%20in%20Spaces-blue)](https://huggingface.co/spaces/flydust/MagpieLM-8B)
72
+
73
+ Please update transformers to the latest version by `pip install git+https://github.com/huggingface/transformers`.
74
+
75
+ You can then run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function.
76
+
77
+ ```python
78
+ import transformers
79
+ import torch
80
+
81
+ model_id = "MagpieLM-8B-Chat-v0.1"
82
+
83
+ pipeline = transformers.pipeline(
84
+ "text-generation",
85
+ model=model_id,
86
+ model_kwargs={"torch_dtype": torch.bfloat16},
87
+ device_map="auto",
88
+ )
89
+
90
+ messages = [
91
+ {"role": "system", "content": "You are Magpie, a friendly AI assistant."},
92
+ {"role": "user", "content": "Who are you?"},
93
+ ]
94
+
95
+ outputs = pipeline(
96
+ messages,
97
+ max_new_tokens=256,
98
+ )
99
+ print(outputs[0]["generated_text"][-1])
100
+ ```
101
+
102
+ ---
103
+ # Alignment Pipeline
104
+
105
+ The detailed alignment pipeline is as follows.
106
+
107
+ ## Stage 1: Supervised Fine-tuning
108
+
109
+ We use [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) for SFT. Please refer to the model card of [SFT checkpoint](https://huggingface.co/Magpie-Align/MagpieLM-8B-SFT-v0.1) and below for detailed configurations.
110
+
111
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
112
+ <details><summary>See axolotl config</summary>
113
+
114
+ axolotl version: `0.4.1`
115
+ ```yaml
116
+ base_model: meta-llama/Meta-Llama-3.1-8B
117
+ model_type: LlamaForCausalLM
118
+ tokenizer_type: AutoTokenizer
119
+ chat_template: llama3
120
+
121
+ load_in_8bit: false
122
+ load_in_4bit: false
123
+ strict: false
124
+ main_process_port: 0
125
+
126
+ datasets:
127
+ - path: Magpie-Align/MagpieLM-SFT-Data-v0.1
128
+ type: sharegpt
129
+ conversation: llama3
130
+
131
+ dataset_prepared_path: last_run_prepared
132
+ val_set_size: 0.001
133
+ output_dir: axolotl_out/MagpieLM-8B-SFT-v0.1
134
+
135
+ sequence_len: 8192
136
+ sample_packing: true
137
+ eval_sample_packing: false
138
+ pad_to_sequence_len: true
139
+
140
+ wandb_project: SynDa
141
+ wandb_entity:
142
+ wandb_watch:
143
+ wandb_name: MagpieLM-8B-SFT-v0.1
144
+ wandb_log_model:
145
+ hub_model_id: Magpie-Align/MagpieLM-8B-SFT-v0.1
146
+
147
+ gradient_accumulation_steps: 32
148
+ micro_batch_size: 1
149
+ num_epochs: 2
150
+ optimizer: paged_adamw_8bit
151
+ lr_scheduler: cosine
152
+ learning_rate: 2e-5
153
+
154
+ train_on_inputs: false
155
+ group_by_length: false
156
+ bf16: auto
157
+ fp16:
158
+ tf32: false
159
+
160
+ gradient_checkpointing: true
161
+ gradient_checkpointing_kwargs:
162
+ use_reentrant: false
163
+ early_stopping_patience:
164
+ resume_from_checkpoint:
165
+ logging_steps: 1
166
+ xformers_attention:
167
+ flash_attention: true
168
+
169
+ warmup_ratio: 0.1
170
+ evals_per_epoch: 5
171
+ eval_table_size:
172
+ saves_per_epoch:
173
+ debug:
174
+ deepspeed:
175
+ weight_decay: 0.0
176
+ fsdp:
177
+ fsdp_config:
178
+ special_tokens:
179
+ pad_token: <|end_of_text|>
180
+ ```
181
+ </details><br>
182
+
183
+ ## Stage 2: Direct Preference Optimization
184
+
185
+ ### Training hyperparameters
186
+
187
+ The following hyperparameters were used during training:
188
+ - learning_rate: 2e-07
189
+ - train_batch_size: 2
190
+ - eval_batch_size: 4
191
+ - seed: 42
192
+ - distributed_type: multi-GPU
193
+ - num_devices: 4
194
+ - gradient_accumulation_steps: 16
195
+ - total_train_batch_size: 128
196
+ - total_eval_batch_size: 16
197
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
198
+ - lr_scheduler_type: cosine
199
+ - lr_scheduler_warmup_ratio: 0.1
200
+ - num_epochs: 1
201
+
202
+ ### Training results
203
+
204
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
205
+ |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
206
+ | 0.686 | 0.0653 | 100 | 0.6856 | -0.0491 | -0.0616 | 0.6480 | 0.0125 | -471.3315 | -478.8181 | -0.7034 | -0.7427 |
207
+ | 0.6218 | 0.1306 | 200 | 0.6277 | -0.6128 | -0.7720 | 0.6960 | 0.1591 | -542.3653 | -535.1920 | -0.7771 | -0.8125 |
208
+ | 0.5705 | 0.1959 | 300 | 0.5545 | -2.4738 | -3.0052 | 0.7270 | 0.5314 | -765.6894 | -721.2881 | -0.7894 | -0.8230 |
209
+ | 0.4606 | 0.2612 | 400 | 0.5081 | -2.6780 | -3.3782 | 0.7560 | 0.7002 | -802.9893 | -741.7116 | -0.6813 | -0.7247 |
210
+ | 0.4314 | 0.3266 | 500 | 0.4787 | -3.6697 | -4.6026 | 0.7630 | 0.9329 | -925.4283 | -840.8740 | -0.6189 | -0.6691 |
211
+ | 0.449 | 0.3919 | 600 | 0.4533 | -3.7414 | -4.8019 | 0.7820 | 1.0604 | -945.3563 | -848.0514 | -0.6157 | -0.6681 |
212
+ | 0.4538 | 0.4572 | 700 | 0.4350 | -4.3858 | -5.6549 | 0.7890 | 1.2690 | -1030.6561 | -912.4920 | -0.5789 | -0.6331 |
213
+ | 0.35 | 0.5225 | 800 | 0.4186 | -4.7129 | -6.1662 | 0.8010 | 1.4533 | -1081.7843 | -945.1964 | -0.5778 | -0.6347 |
214
+ | 0.4153 | 0.5878 | 900 | 0.4108 | -4.9836 | -6.5320 | 0.7970 | 1.5484 | -1118.3677 | -972.2631 | -0.5895 | -0.6474 |
215
+ | 0.3935 | 0.6531 | 1000 | 0.3999 | -4.4303 | -5.9370 | 0.8110 | 1.5067 | -1058.8646 | -916.9379 | -0.6016 | -0.6598 |
216
+ | 0.3205 | 0.7184 | 1100 | 0.3950 | -5.1884 | -6.8827 | 0.8010 | 1.6943 | -1153.4371 | -992.7452 | -0.5846 | -0.6452 |
217
+ | 0.3612 | 0.7837 | 1200 | 0.3901 | -5.0426 | -6.7179 | 0.8040 | 1.6753 | -1136.9619 | -978.1701 | -0.6046 | -0.6637 |
218
+ | 0.3058 | 0.8490 | 1300 | 0.3877 | -5.1224 | -6.8428 | 0.8040 | 1.7204 | -1149.4465 | -986.1475 | -0.6087 | -0.6690 |
219
+ | 0.3467 | 0.9144 | 1400 | 0.3871 | -5.2335 | -6.9809 | 0.8090 | 1.7474 | -1163.2629 | -997.2610 | -0.6071 | -0.6672 |
220
+ | 0.3197 | 0.9797 | 1500 | 0.3867 | -5.1502 | -6.8793 | 0.8080 | 1.7291 | -1153.0979 | -988.9237 | -0.6120 | -0.6722 |
221
+
222
+
223
+ ### Framework versions
224
+
225
+ - Transformers 4.44.2
226
+ - Pytorch 2.4.1+cu121
227
+ - Datasets 3.0.0
228
+ - Tokenizers 0.19.1
229
+
230
+ <details><summary>See alignment handbook configs</summary>
231
+
232
+ ```yaml
233
+ # Customized Configs
234
+ model_name_or_path: Magpie-Align/MagpieLM-8B-SFT-v0.1
235
+ hub_model_id: Magpie-Align/MagpieLM-8B-Chat-v0.1
236
+ output_dir: alignment_handbook_out/MagpieLM-8B-Chat-v0.1
237
+ run_name: MagpieLM-8B-Chat-v0.1
238
+
239
+ dataset_mixer:
240
+ Magpie-Align/MagpieLM-DPO-Data-v0.1: 1.0
241
+ dataset_splits:
242
+ - train
243
+ - test
244
+ preprocessing_num_workers: 24
245
+
246
+ # DPOTrainer arguments
247
+ bf16: true
248
+ beta: 0.01
249
+ learning_rate: 2.0e-7
250
+ gradient_accumulation_steps: 16
251
+ per_device_train_batch_size: 2
252
+ per_device_eval_batch_size: 4
253
+ num_train_epochs: 1
254
+ max_length: 2048
255
+ max_prompt_length: 1800
256
+ warmup_ratio: 0.1
257
+ logging_steps: 1
258
+ lr_scheduler_type: cosine
259
+ optim: adamw_torch
260
+
261
+ torch_dtype: null
262
+ # use_flash_attention_2: true
263
+ do_eval: true
264
+ evaluation_strategy: steps
265
+ eval_steps: 100
266
+ gradient_checkpointing: true
267
+ gradient_checkpointing_kwargs:
268
+ use_reentrant: False
269
+ log_level: info
270
+ push_to_hub: true
271
+ save_total_limit: 0
272
+ seed: 42
273
+ report_to:
274
+ - wandb
275
+ ```
276
+ </details><be>
277
+
278
+ ## 📚 Citation
279
+
280
+ If you find the model, data, or code useful, please cite:
281
+ ```
282
+ @article{xu2024magpie,
283
+ title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
284
+ author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
285
+ year={2024},
286
+ eprint={2406.08464},
287
+ archivePrefix={arXiv},
288
+ primaryClass={cs.CL}
289
+ }
290
+ ```
291
+
292
+ **Contact**
293
+
294
+ Questions? Contact:
295
+ - [Zhangchen Xu](https://zhangchenxu.com/) [zxu9 at uw dot edu], and
296
+ - [Bill Yuchen Lin](https://yuchenlin.xyz/) [yuchenlin1995 at gmail dot com]
297
+
298
+