Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.07413

XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference

Paper • 2404.15420 • Published Apr 23 • 7
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published Apr 22 • 124
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22 • 250
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study

Paper • 2404.14047 • Published Apr 22 • 43

Rho-1: Not All Tokens Are What You Need

Paper • 2404.07965 • Published Apr 11 • 83
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time

Paper • 2404.10667 • Published Apr 16 • 14
Instruction-tuned Language Models are Better Knowledge Learners

Paper • 2402.12847 • Published Feb 20 • 24
DoRA: Weight-Decomposed Low-Rank Adaptation

Paper • 2402.09353 • Published Feb 14 • 24

Foundation Models

OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1 • 78
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Paper • 2403.05530 • Published Mar 8 • 59
StarCoder: may the source be with you!

Paper • 2305.06161 • Published May 9, 2023 • 29
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Paper • 2312.15166 • Published Dec 23, 2023 • 56

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11 • 35
Rho-1: Not All Tokens Are What You Need

Paper • 2404.07965 • Published Apr 11 • 83
Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28 • 102
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 103

Papers - Attention - Mixture of Attention Heads (MoA)

Generalized multi head using RoPE

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11 • 35

Papers - Megatron

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11 • 35

Papers - University - Princeton University

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11 • 35
Allowing humans to interactively guide machines where to look does not always improve a human-AI team's classification accuracy

Paper • 2404.05238 • Published Apr 8 • 2
Cognitive Architectures for Language Agents

Paper • 2309.02427 • Published Sep 5, 2023 • 2
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings

Paper • 2305.13571 • Published May 23, 2023 • 2

Papers - University - MIT

One-step Diffusion with Distribution Matching Distillation

Paper • 2311.18828 • Published Nov 30, 2023 • 3
The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26 • 77
Condition-Aware Neural Network for Controlled Image Generation

Paper • 2404.01143 • Published Apr 1 • 11
Locating and Editing Factual Associations in GPT

Paper • 2202.05262 • Published Feb 10, 2022 • 1

Papers - Fine-tuning - DPO

Refer to additional papers: https://link.springer.com/article/10.1007/s10994-014-5458-8 and https://link.springer.com/article/10.1007/BF00992696

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 43
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization

Paper • 2402.09320 • Published Feb 14 • 6
sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28 • 38
Dueling RL: Reinforcement Learning with Trajectory Preferences

Paper • 2111.04850 • Published Nov 8, 2021 • 2

Crystalcareai/GemMoE-Beta-1

Text Generation • Updated Mar 20 • 14 • 79
JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11 • 35
Multi-Head Mixture-of-Experts

Paper • 2404.15045 • Published Apr 23 • 57

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs