Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies Paper • 2407.13623 • Published 4 days ago • 39
Agentless: Demystifying LLM-based Software Engineering Agents Paper • 2407.01489 • Published 21 days ago • 41
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation Paper • 2407.02869 • Published 20 days ago • 15
No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models Paper • 2407.02687 • Published 20 days ago • 20
RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network Paper • 2406.18284 • Published 26 days ago • 17
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published 24 days ago • 84
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS Paper • 2406.18009 • Published 27 days ago • 18
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published Jun 10 • 62
Make It Count: Text-to-Image Generation with an Accurate Number of Objects Paper • 2406.10210 • Published Jun 14 • 75
view article Article Fish Speech V1 - New Multilingual Open Source TTS Model By lengyue233 • May 3 • 10
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Paper • 2405.21060 • Published May 31 • 61
Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack Paper • 2309.15807 • Published Sep 27, 2023 • 30
Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration Paper • 2307.05300 • Published Jul 11, 2023 • 18
Collaborative Score Distillation for Consistent Visual Synthesis Paper • 2307.04787 • Published Jul 4, 2023 • 27
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis Paper • 2307.01952 • Published Jul 4, 2023 • 78