Sheared-LLaMA-1.3B / README.md
princeton-nlp's picture
Update README.md
59aab8d
|
raw
history blame
No virus
1.8 kB
metadata
license: apache-2.0

Sheared-LLaMA-1.3B is a model pruned and further pre-trained from meta-llama/Llama-2-7b-hf. We dynamically load data from different domains in the RedPajama dataset to prune and contune pre-train the model. We use 0.4B tokens for pruning and 50B tokens for continued pre-training the pruned model. This model can be loaded with HuggingFace via

model = AutoModelForCausalLM.from_pretrained("princeton-nlp/Sheared-LLaMA-1.3B")

Paper: https://arxiv.org/pdf/2310.06694.pdf Code: https://github.com/princeton-nlp/LLM-Shearing Models: Sheared-LLaMA-1.3B, Sheared-LLaMA-2.7B


Downstream Tasks

We evaluate on an extensive set of downstream tasks including reasoning, reading comprehension, language modeling and knowledge intensive tasks. Our Sheared-LLaMA models outperform existing large language models.

Model # Pre-training Tokens Average Performance
LLaMA2-7B 2T 64.6

1.3B

OPT-1.3B 300B 48.2
Pythia-1.4B 300B 48.9
Sheared-LLaMA-1.3B 50B 51.0

3B

OPT-2.7B 300B 51.4
Pythia-2.8B 300B 52.5
INCITE-Base-3B 800B 54.7
Open-LLaMA-3B-v1 1T 55.1
Open-LLaMA-3B-v2 1T 55.7
Sheared-LLaMA-2.7B 50B 56.7

Bibtex

@article{xia2023sheared,
   title={Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning},
   author={Xia, Mengzhou and Gao, Tianyu, and Zeng Zhiyuan, and Chen Danqi},
   year={2023}
}