Magpie-Align
/

MagpieLM-4B-Chat-v0.1

@@ -25,7 +25,7 @@ model-index:
 *Model full name: Llama3.1-MagpieLM-4B-Chat-v0.1*
-This model is an aligned version of [Llama-3.1-Minitron-4B-Width](https://huggingface.co/nvidia/Llama-3.1-Minitron-4B-Width-Base), which achieves state-of-the-art performance among open-aligned SLMs. It even outperforms larger open-weight models including Llama-3-8B-Instruct, Llama-3.1-8B-Instruct and Qwen-2-7B-Instruct.
 We apply the following standard alignment pipeline with two carefully crafted synthetic datasets. Feel free to use these datasets and reproduce our model, or make your own friendly chatbots :)
@@ -34,6 +34,8 @@ We first perform SFT using [Magpie-Align/MagpieLM-SFT-Data-v0.1](https://hugging
 We then perform DPO on the [Magpie-Align/MagpieLM-DPO-Data-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-DPO-Data-v0.1) dataset.
 ## 🔥 Benchmark Performance
 Greedy Decoding
@@ -44,16 +46,20 @@ Greedy Decoding
 **Benchmark Performance Compare to Other SOTA SLMs**
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/lMZ9M2h_9fJsjrw0BmPVD.png)
 ## 👀 Other Information
 **License**: Please follow [NVIDIA Open Model License Agreement](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf).
-**Conversation Template**: Please use the Llama 3 chat template for the best performance.
 ## 🧐 How to use it?
 Please update transformers to the latest version by `pip install git+https://github.com/huggingface/transformers`.
 You can then run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function.
@@ -161,12 +167,11 @@ special_tokens:
   pad_token: <|end_of_text|>
 ```
 </details><br>
 ## Stage 2: Direct Preference Optimization
-## Training procedure
 ### Training hyperparameters

 *Model full name: Llama3.1-MagpieLM-4B-Chat-v0.1*
+This model is an aligned version of [Llama-3.1-Minitron-4B-Width](https://huggingface.co/nvidia/Llama-3.1-Minitron-4B-Width-Base), which achieves state-of-the-art performance among open-aligned SLMs. It even outperforms larger open-weight models including Llama-3-8B-Instruct, Llama-3.1-8B-Instruct and Qwen-2-7B-Instruct.
 We apply the following standard alignment pipeline with two carefully crafted synthetic datasets. Feel free to use these datasets and reproduce our model, or make your own friendly chatbots :)
 We then perform DPO on the [Magpie-Align/MagpieLM-DPO-Data-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-DPO-Data-v0.1) dataset.
+[*See more powerful 8B version here!*](https://huggingface.co/Magpie-Align/MagpieLM-8B-Chat-v0.1)
 ## 🔥 Benchmark Performance
 Greedy Decoding
 **Benchmark Performance Compare to Other SOTA SLMs**
+![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/cNigvzqznKWRy1YfktZ6J.jpeg)
 ## 👀 Other Information
 **License**: Please follow [NVIDIA Open Model License Agreement](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf).
+**Conversation Template**: Please use the **Llama 3 chat template** for the best performance.
+**Limitations**: This model primarily understands and generates content in English. Its outputs may contain factual errors, logical inconsistencies, or reflect biases present in the training data. While the model aims to improve instruction-following and helpfulness, it isn't specifically designed for complex reasoning tasks, potentially leading to suboptimal performance in these areas. Additionally, the model may produce unsafe or inappropriate content, as no specific safety training were implemented during the alignment process.
 ## 🧐 How to use it?
+[![Spaces](https://img.shields.io/badge/🤗-Open%20in%20Spaces-blue)](https://huggingface.co/spaces/flydust/MagpieLM-4B)
 Please update transformers to the latest version by `pip install git+https://github.com/huggingface/transformers`.
 You can then run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function.
   pad_token: <|end_of_text|>
 ```
 </details><br>
 ## Stage 2: Direct Preference Optimization
+We use [alignment handbook](https://github.com/huggingface/alignment-handbook) for DPO.
 ### Training hyperparameters