Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2405.12981

efficient inference

Hydragen: High-Throughput LLM Inference with Shared Prefixes

Paper • 2402.05099 • Published Feb 7 • 18
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting

Paper • 2402.13720 • Published Feb 21 • 5
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21 • 28
Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19 • 150

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Paper • 2401.02954 • Published Jan 5 • 40
Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 34
GPT-4 Technical Report

Paper • 2303.08774 • Published Mar 15, 2023 • 5
Gemini: A Family of Highly Capable Multimodal Models

Paper • 2312.11805 • Published Dec 19, 2023 • 45

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs