Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2409.11355

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6 • 25
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6 • 12
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7 • 36
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7 • 19

Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Paper • 2409.11355 • Published 12 days ago • 26
Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published 10 days ago • 119
Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs

Paper • 2409.14988 • Published 6 days ago • 20

Diffusion Models

SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher

Paper • 2408.14176 • Published Aug 26 • 58
Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27 • 120
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published Aug 20 • 56
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model

Paper • 2409.01199 • Published 27 days ago • 12

AI Math: Diffusion

Controllable Text Generation for Large Language Models: A Survey

Paper • 2408.12599 • Published Aug 22 • 61
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

Paper • 2408.12590 • Published Aug 22 • 33
DreamCinema: Cinematic Transfer with Free Camera and 3D Character

Paper • 2408.12601 • Published Aug 22 • 28
Real-Time Video Generation with Pyramid Attention Broadcast

Paper • 2408.12588 • Published Aug 22 • 13

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

Paper • 2407.08083 • Published Jul 10 • 27
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published Aug 20 • 56
The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Paper • 2408.15237 • Published Aug 27 • 36
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Paper • 2409.11355 • Published 12 days ago • 26

Generative Model

SelfEval: Leveraging the discriminative nature of generative models for evaluation

Paper • 2311.10708 • Published Nov 17, 2023 • 14
OmniGen: Unified Image Generation

Paper • 2409.11340 • Published 12 days ago • 78
NVLM: Open Frontier-Class Multimodal LLMs

Paper • 2409.11402 • Published 12 days ago • 55
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Paper • 2409.11355 • Published 12 days ago • 26

Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

Paper • 2406.09416 • Published Jun 13 • 28
Wavelets Are All You Need for Autoregressive Image Generation

Paper • 2406.19997 • Published Jun 28 • 28
ViPer: Visual Personalization of Generative Models via Individual Preference Learning

Paper • 2407.17365 • Published Jul 24 • 11
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning

Paper • 2408.11001 • Published Aug 20 • 11

Stable Diffusion

Zero-shot Image Editing with Reference Imitation

Paper • 2406.07547 • Published Jun 11 • 30
Scalable Autoregressive Image Generation with Mamba

Paper • 2408.12245 • Published Aug 22 • 23
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Paper • 2409.11355 • Published 12 days ago • 26

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

Paper • 2311.17049 • Published Nov 28, 2023
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7 • 13
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision

Paper • 2303.17376 • Published Mar 30, 2023
Sigmoid Loss for Language Image Pre-Training

Paper • 2303.15343 • Published Mar 27, 2023 • 4

about 5 hours ago

LocalMamba: Visual State Space Model with Windowed Selective Scan

Paper • 2403.09338 • Published Mar 14 • 7
GiT: Towards Generalist Vision Transformer through Universal Language Interface

Paper • 2403.09394 • Published Mar 14 • 25
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Paper • 2402.19479 • Published Feb 29 • 32
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Paper • 2405.10300 • Published May 16 • 26

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs