metadata
license: apache-2.0
Paper: https://arxiv.org/pdf/2310.06694.pdf
Code: https://github.com/princeton-nlp/LLM-Shearing
Models: Sheared-LLaMA-1.3B, Sheared-LLaMA-2.7B
Sheared-LLaMA-1.3B is a model pruned and further pre-trained from meta-llama/Llama-2-7b-hf. We dynamically load data from different domains in the RedPajama dataset to prune and contune pre-train the model. We use 0.4B tokens for pruning and 50B tokens for continued pre-training the pruned model. This model can be loaded with HuggingFace via
model = AutoModelForCausalLM.from_pretrained("princeton-nlp/Sheared-LLaMA-1.3B")
Downstream Tasks
We evaluate on an extensive set of downstream tasks including reasoning, reading comprehension, language modeling and knowledge intensive tasks. Our Sheared-LLaMA models outperform existing large language models.
Model | # Pre-training Tokens | Average Performance |
---|---|---|
LLaMA2-7B | 2T | 64.6 |
1.3B
Model | # Pre-training Tokens | Average Performance |
---|---|---|
OPT-1.3B | 300B | 48.2 |
Pythia-1.4B | 300B | 48.9 |
Sheared-LLaMA-1.3B | 50B | 51.0 |
3B
Model | # Pre-training Tokens | Average Performance |
---|---|---|
OPT-2.7B | 300B | 51.4 |
Pythia-2.8B | 300B | 52.5 |
INCITE-Base-3B | 800B | 54.7 |
Open-LLaMA-3B-v1 | 1T | 55.1 |
Open-LLaMA-3B-v2 | 1T | 55.7 |
Sheared-LLaMA-2.7B | 50B | 56.7 |
Bibtex
@article{xia2023sheared,
title={Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning},
author={Xia, Mengzhou and Gao, Tianyu, and Zeng Zhiyuan, and Chen Danqi},
year={2023}
}