-
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Paper • 2402.05099 • Published • 18 -
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting
Paper • 2402.13720 • Published • 5 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 28 -
Your Transformer is Secretly Linear
Paper • 2405.12250 • Published • 150
Collections
Discover the best community collections!
Collections including paper arxiv:2405.12981
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 40 -
Qwen Technical Report
Paper • 2309.16609 • Published • 34 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 5 -
Gemini: A Family of Highly Capable Multimodal Models
Paper • 2312.11805 • Published • 45