Distributed Training: Train BART/T5 for Summarization using 🤗 Transformers and Amazon SageMaker Apr 8, 2021
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models Paper • 2407.01906 • Published 4 days ago • 18 • 1
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems Paper • 2407.01370 • Published 4 days ago • 60 • 7
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published 10 days ago • 73 • 3
How Do Large Language Models Acquire Factual Knowledge During Pretraining? Paper • 2406.11813 • Published 18 days ago • 28 • 1
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model Paper • 2405.04434 • Published May 7 • 11 • 2
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 102 • 6
Improved Baselines with Visual Instruction Tuning Paper • 2310.03744 • Published Oct 5, 2023 • 34 • 5
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models Paper • 2402.13064 • Published Feb 20 • 46 • 2