princeton-nlp
/

Sheared-LLaMA-1.3B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

princeton-nlp commited on Oct 11, 2023

Commit

83be984

•

1 Parent(s): 59aab8d

Update README.md

Files changed (1) hide show

README.md +8 -9

README.md CHANGED Viewed

@@ -2,20 +2,19 @@
 license: apache-2.0
 ---
 Sheared-LLaMA-1.3B is a model pruned and further pre-trained from [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf). We dynamically load data from different domains in the [RedPajama dataset](https://github.com/togethercomputer/RedPajama-Data) to prune and contune pre-train the model. We use 0.4B tokens for pruning and 50B tokens for continued pre-training the pruned model. This model can be loaded with HuggingFace via
 ```
 model = AutoModelForCausalLM.from_pretrained("princeton-nlp/Sheared-LLaMA-1.3B")
 ```
-**Paper**: [https://arxiv.org/pdf/2310.06694.pdf](https://arxiv.org/pdf/2310.06694.pdf)
-**Code**: https://github.com/princeton-nlp/LLM-Shearing
-**Models**: [Sheared-LLaMA-1.3B](https://huggingface.co/princeton-nlp/Sheared-LLaMA-1.3B), [Sheared-LLaMA-2.7B](https://huggingface.co/princeton-nlp/Sheared-LLaMA-2.7B)
----
-### Downstream Tasks
 We evaluate on an extensive set of downstream tasks including reasoning, reading comprehension, language modeling and knowledge intensive tasks. Our Sheared-LLaMA models outperform existing large language models.
@@ -40,7 +39,7 @@ We evaluate on an extensive set of downstream tasks including reasoning, reading
 | Open-LLaMA-3B-v2 | 1T | 55.7 |
 | Sheared-LLaMA-2.7B | 50B | 56.7 |
-### Bibtex
 ```
 @article{xia2023sheared,
    title={Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning},

 license: apache-2.0
 ---
+**Paper**: [https://arxiv.org/pdf/2310.06694.pdf](https://arxiv.org/pdf/2310.06694.pdf)
+**Code**: https://github.com/princeton-nlp/LLM-Shearing
+**Models**: [Sheared-LLaMA-1.3B](https://huggingface.co/princeton-nlp/Sheared-LLaMA-1.3B), [Sheared-LLaMA-2.7B](https://huggingface.co/princeton-nlp/Sheared-LLaMA-2.7B)
+---
 Sheared-LLaMA-1.3B is a model pruned and further pre-trained from [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf). We dynamically load data from different domains in the [RedPajama dataset](https://github.com/togethercomputer/RedPajama-Data) to prune and contune pre-train the model. We use 0.4B tokens for pruning and 50B tokens for continued pre-training the pruned model. This model can be loaded with HuggingFace via
 ```
 model = AutoModelForCausalLM.from_pretrained("princeton-nlp/Sheared-LLaMA-1.3B")
 ```
+## Downstream Tasks
 We evaluate on an extensive set of downstream tasks including reasoning, reading comprehension, language modeling and knowledge intensive tasks. Our Sheared-LLaMA models outperform existing large language models.
 | Open-LLaMA-3B-v2 | 1T | 55.7 |
 | Sheared-LLaMA-2.7B | 50B | 56.7 |
+## Bibtex
 ```
 @article{xia2023sheared,
    title={Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning},