MagpieLM-8B-Chat-v0.1-GGUF / README.md

aashish1904

Upload README.md with huggingface_hub

7411180 verified 2 days ago

preview code

raw

history blame

No virus

11.5 kB


	---

	library_name: transformers
	license: llama3.1
	base_model: Magpie-Align/MagpieLM-8B-SFT-v0.1
	tags:
	- alignment-handbook
	- trl
	- dpo
	- generated_from_trainer
	datasets:
	- Magpie-Align/MagpieLM-SFT-Data-v0.1
	- Magpie-Align/MagpieLM-DPO-Data-v0.1
	model-index:
	- name: MagpieLM-8B-Chat-v0.1
	results: []

	---

	[![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)


	# QuantFactory/MagpieLM-8B-Chat-v0.1-GGUF
	This is quantized version of [Magpie-Align/MagpieLM-8B-Chat-v0.1](https://huggingface.co/Magpie-Align/MagpieLM-8B-Chat-v0.1) created using llama.cpp

	# Original Model Card


	![Magpie](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/FWWILXrAGNwWr52aghV0S.png)

	# 🐦 MagpieLM-8B-Chat-v0.1

	[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://api.wandb.ai/links/uw-nsl/0s1eegy2)

	## 🧐 About This Model

	Model full name: Llama3.1-MagpieLM-8B-Chat-v0.1

	This model is an aligned version of [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B), which achieves state-of-the-art performance among open-aligned SLMs. It even outperforms larger open-weight models including Llama-3-8B-Instruct, Llama-3.1-8B-Instruct, Qwen-2-7B-Instruct, and Gemma-2-9B-it.

	We apply the following standard alignment pipeline with two carefully crafted synthetic datasets.

	We first perform SFT using [Magpie-Align/MagpieLM-SFT-Data-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-SFT-Data-v0.1).
	* SFT Model Checkpoint: [Magpie-Align/MagpieLM-8B-SFT-v0.1](https://huggingface.co/Magpie-Align/MagpieLM-8B-SFT-v0.1)

	We then perform DPO on the [Magpie-Align/MagpieLM-DPO-Data-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-DPO-Data-v0.1) dataset.

	## 🔥 Benchmark Performance

	Greedy Decoding

	- Alpaca Eval 2: 58.18 (LC), 62.38 (WR)
	- Arena Hard: 48.4
	- WildBench WB Score (v2.0625): 44.72

	Benchmark Performance Compare to Other SOTA SLMs

	![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/q1Rasy66h6lmaUP1KQ407.jpeg)

	## 👀 Other Information

	License: Please follow [Meta Llama 3.1 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE).

	Conversation Template: Please use the Llama 3 chat template for the best performance.

	Limitations: This model primarily understands and generates content in English. Its outputs may contain factual errors, logical inconsistencies, or reflect biases present in the training data. While the model aims to improve instruction-following and helpfulness, it isn't specifically designed for complex reasoning tasks, potentially leading to suboptimal performance in these areas. Additionally, the model may produce unsafe or inappropriate content, as no specific safety training were implemented during the alignment process.

	## 🧐 How to use it?

	[![Spaces](https://img.shields.io/badge/🤗-Open%20in%20Spaces-blue)](https://huggingface.co/spaces/flydust/MagpieLM-8B)

	Please update transformers to the latest version by `pip install git+https://github.com/huggingface/transformers`.

	You can then run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function.

	```python
	import transformers
	import torch

	model_id = "MagpieLM-8B-Chat-v0.1"

	pipeline = transformers.pipeline(
	"text-generation",
	model=model_id,
	model_kwargs={"torch_dtype": torch.bfloat16},
	device_map="auto",
	)

	messages = [
	{"role": "system", "content": "You are Magpie, a friendly AI assistant."},
	{"role": "user", "content": "Who are you?"},
	]

	outputs = pipeline(
	messages,
	max_new_tokens=256,
	)
	print(outputs[0]["generated_text"][-1])
	```

	---
	# Alignment Pipeline

	The detailed alignment pipeline is as follows.

	## Stage 1: Supervised Fine-tuning

	We use [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) for SFT. Please refer to the model card of [SFT checkpoint](https://huggingface.co/Magpie-Align/MagpieLM-8B-SFT-v0.1) and below for detailed configurations.

	[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
	<details><summary>See axolotl config</summary>

	axolotl version: `0.4.1`
	```yaml
	base_model: meta-llama/Meta-Llama-3.1-8B
	model_type: LlamaForCausalLM
	tokenizer_type: AutoTokenizer
	chat_template: llama3

	load_in_8bit: false
	load_in_4bit: false
	strict: false
	main_process_port: 0

	datasets:
	- path: Magpie-Align/MagpieLM-SFT-Data-v0.1
	type: sharegpt
	conversation: llama3

	dataset_prepared_path: last_run_prepared
	val_set_size: 0.001
	output_dir: axolotl_out/MagpieLM-8B-SFT-v0.1

	sequence_len: 8192
	sample_packing: true
	eval_sample_packing: false
	pad_to_sequence_len: true

	wandb_project: SynDa
	wandb_entity:
	wandb_watch:
	wandb_name: MagpieLM-8B-SFT-v0.1
	wandb_log_model:
	hub_model_id: Magpie-Align/MagpieLM-8B-SFT-v0.1

	gradient_accumulation_steps: 32
	micro_batch_size: 1
	num_epochs: 2
	optimizer: paged_adamw_8bit
	lr_scheduler: cosine
	learning_rate: 2e-5

	train_on_inputs: false
	group_by_length: false
	bf16: auto
	fp16:
	tf32: false

	gradient_checkpointing: true
	gradient_checkpointing_kwargs:
	use_reentrant: false
	early_stopping_patience:
	resume_from_checkpoint:
	logging_steps: 1
	xformers_attention:
	flash_attention: true

	warmup_ratio: 0.1
	evals_per_epoch: 5
	eval_table_size:
	saves_per_epoch:
	debug:
	deepspeed:
	weight_decay: 0.0
	fsdp:
	fsdp_config:
	special_tokens:
	pad_token: <\|end_of_text\|>
	```
	</details><br>

	## Stage 2: Direct Preference Optimization

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-07
	- train_batch_size: 2
	- eval_batch_size: 4
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 4
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 128
	- total_eval_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.686 \| 0.0653 \| 100 \| 0.6856 \| -0.0491 \| -0.0616 \| 0.6480 \| 0.0125 \| -471.3315 \| -478.8181 \| -0.7034 \| -0.7427 \|
	\| 0.6218 \| 0.1306 \| 200 \| 0.6277 \| -0.6128 \| -0.7720 \| 0.6960 \| 0.1591 \| -542.3653 \| -535.1920 \| -0.7771 \| -0.8125 \|
	\| 0.5705 \| 0.1959 \| 300 \| 0.5545 \| -2.4738 \| -3.0052 \| 0.7270 \| 0.5314 \| -765.6894 \| -721.2881 \| -0.7894 \| -0.8230 \|
	\| 0.4606 \| 0.2612 \| 400 \| 0.5081 \| -2.6780 \| -3.3782 \| 0.7560 \| 0.7002 \| -802.9893 \| -741.7116 \| -0.6813 \| -0.7247 \|
	\| 0.4314 \| 0.3266 \| 500 \| 0.4787 \| -3.6697 \| -4.6026 \| 0.7630 \| 0.9329 \| -925.4283 \| -840.8740 \| -0.6189 \| -0.6691 \|
	\| 0.449 \| 0.3919 \| 600 \| 0.4533 \| -3.7414 \| -4.8019 \| 0.7820 \| 1.0604 \| -945.3563 \| -848.0514 \| -0.6157 \| -0.6681 \|
	\| 0.4538 \| 0.4572 \| 700 \| 0.4350 \| -4.3858 \| -5.6549 \| 0.7890 \| 1.2690 \| -1030.6561 \| -912.4920 \| -0.5789 \| -0.6331 \|
	\| 0.35 \| 0.5225 \| 800 \| 0.4186 \| -4.7129 \| -6.1662 \| 0.8010 \| 1.4533 \| -1081.7843 \| -945.1964 \| -0.5778 \| -0.6347 \|
	\| 0.4153 \| 0.5878 \| 900 \| 0.4108 \| -4.9836 \| -6.5320 \| 0.7970 \| 1.5484 \| -1118.3677 \| -972.2631 \| -0.5895 \| -0.6474 \|
	\| 0.3935 \| 0.6531 \| 1000 \| 0.3999 \| -4.4303 \| -5.9370 \| 0.8110 \| 1.5067 \| -1058.8646 \| -916.9379 \| -0.6016 \| -0.6598 \|
	\| 0.3205 \| 0.7184 \| 1100 \| 0.3950 \| -5.1884 \| -6.8827 \| 0.8010 \| 1.6943 \| -1153.4371 \| -992.7452 \| -0.5846 \| -0.6452 \|
	\| 0.3612 \| 0.7837 \| 1200 \| 0.3901 \| -5.0426 \| -6.7179 \| 0.8040 \| 1.6753 \| -1136.9619 \| -978.1701 \| -0.6046 \| -0.6637 \|
	\| 0.3058 \| 0.8490 \| 1300 \| 0.3877 \| -5.1224 \| -6.8428 \| 0.8040 \| 1.7204 \| -1149.4465 \| -986.1475 \| -0.6087 \| -0.6690 \|
	\| 0.3467 \| 0.9144 \| 1400 \| 0.3871 \| -5.2335 \| -6.9809 \| 0.8090 \| 1.7474 \| -1163.2629 \| -997.2610 \| -0.6071 \| -0.6672 \|
	\| 0.3197 \| 0.9797 \| 1500 \| 0.3867 \| -5.1502 \| -6.8793 \| 0.8080 \| 1.7291 \| -1153.0979 \| -988.9237 \| -0.6120 \| -0.6722 \|


	### Framework versions

	- Transformers 4.44.2
	- Pytorch 2.4.1+cu121
	- Datasets 3.0.0
	- Tokenizers 0.19.1

	<details><summary>See alignment handbook configs</summary>

	```yaml
	# Customized Configs
	model_name_or_path: Magpie-Align/MagpieLM-8B-SFT-v0.1
	hub_model_id: Magpie-Align/MagpieLM-8B-Chat-v0.1
	output_dir: alignment_handbook_out/MagpieLM-8B-Chat-v0.1
	run_name: MagpieLM-8B-Chat-v0.1

	dataset_mixer:
	Magpie-Align/MagpieLM-DPO-Data-v0.1: 1.0
	dataset_splits:
	- train
	- test
	preprocessing_num_workers: 24

	# DPOTrainer arguments
	bf16: true
	beta: 0.01
	learning_rate: 2.0e-7
	gradient_accumulation_steps: 16
	per_device_train_batch_size: 2
	per_device_eval_batch_size: 4
	num_train_epochs: 1
	max_length: 2048
	max_prompt_length: 1800
	warmup_ratio: 0.1
	logging_steps: 1
	lr_scheduler_type: cosine
	optim: adamw_torch

	torch_dtype: null
	# use_flash_attention_2: true
	do_eval: true
	evaluation_strategy: steps
	eval_steps: 100
	gradient_checkpointing: true
	gradient_checkpointing_kwargs:
	use_reentrant: False
	log_level: info
	push_to_hub: true
	save_total_limit: 0
	seed: 42
	report_to:
	- wandb
	```
	</details><be>

	## 📚 Citation

	If you find the model, data, or code useful, please cite:
	```
	@article{xu2024magpie,
	title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
	author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
	year={2024},
	eprint={2406.08464},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```

	Contact

	Questions? Contact:
	- [Zhangchen Xu](https://zhangchenxu.com/) [zxu9 at uw dot edu], and
	- [Bill Yuchen Lin](https://yuchenlin.xyz/) [yuchenlin1995 at gmail dot com]