Edit model card

Llama-3-Instruct-8B-SimPO-ExPO

The extrapolated (ExPO) model based on princeton-nlp/Llama-3-Instruct-8B-SimPO and meta-llama/Meta-Llama-3-8B-Instruct, as in the "Weak-to-Strong Extrapolation Expedites Alignment" paper.

Specifically, we obtain this model by extrapolating (alpha = 0.3) from the weights of the SFT and DPO/RLHF checkpoints, achieving superior alignment with human preference.

This extrapolated model achieves the 40.6% win rate and 45.8% LC win rate on AlpacaEval 2.0, outperforming the original Llama-3-Instruct-8B-SimPO's 40.5% and 44.7%, respectively.

Evaluation Results

Evaluation results on the AlpacaEval 2.0 benchmark (you can find the evaluation outputs on the official GitHub repo):

Win Rate (Ori) LC Win Rate (Ori) Win Rate (+ ExPO) LC Win Rate (+ ExPO)
HuggingFaceH4/zephyr-7b-alpha 6.7% 10.0% 10.6% 13.6%
HuggingFaceH4/zephyr-7b-beta 10.2% 13.2% 11.1% 14.0%
berkeley-nest/Starling-LM-7B-alpha 15.0% 18.3% 18.2% 19.5%
Nexusflow/Starling-LM-7B-beta 26.6% 25.8% 29.6% 26.4%
snorkelai/Snorkel-Mistral-PairRM 24.7% 24.0% 28.8% 26.4%
RLHFlow/LLaMA3-iterative-DPO-final 29.2% 36.0% 32.7% 37.8%
internlm/internlm2-chat-1.8b 3.8% 4.0% 5.2% 4.3%
internlm/internlm2-chat-7b 20.5% 18.3% 28.1% 22.7%
internlm/internlm2-chat-20b 36.1% 24.9% 46.2% 27.2%
allenai/tulu-2-dpo-7b 8.5% 10.2% 11.5% 11.7%
allenai/tulu-2-dpo-13b 11.2% 15.5% 15.6% 17.6%
allenai/tulu-2-dpo-70b 15.4% 21.2% 23.0% 25.7%

Evaluation results on the MT-Bench benchmark (you can find the evaluation outputs on the official GitHub repo):

Original + ExPO
HuggingFaceH4/zephyr-7b-alpha 6.85 6.87
HuggingFaceH4/zephyr-7b-beta 7.02 7.06
berkeley-nest/Starling-LM-7B-alpha 7.82 7.91
Nexusflow/Starling-LM-7B-beta 8.10 8.18
snorkelai/Snorkel-Mistral-PairRM 7.63 7.69
RLHFlow/LLaMA3-iterative-DPO-final 8.08 8.45
internlm/internlm2-chat-1.8b 5.17 5.26
internlm/internlm2-chat-7b 7.72 7.80
internlm/internlm2-chat-20b 8.13 8.26
allenai/tulu-2-dpo-7b 6.35 6.38
allenai/tulu-2-dpo-13b 7.00 7.26
allenai/tulu-2-dpo-70b 7.79 8.03
Downloads last month
692
Safetensors
Model size
8.03B params
Tensor type
BF16
Β·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using chujiezheng/Llama-3-Instruct-8B-SimPO-ExPO 4

Collection including chujiezheng/Llama-3-Instruct-8B-SimPO-ExPO