EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models Paper • 2410.07133 • Published 6 days ago • 17
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation Paper • 2410.05363 • Published 8 days ago • 42
Training-free Long Video Generation with Chain of Diffusion Model Experts Paper • 2408.13423 • Published Aug 24 • 20
DragAnything: Motion Control for Anything using Entity Representation Paper • 2403.07420 • Published Mar 12 • 12
VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models Paper • 2403.06098 • Published Mar 10 • 15
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Paper • 2402.19479 • Published Feb 29 • 32
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model Paper • 2402.17412 • Published Feb 27 • 21
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners Paper • 2402.17723 • Published Feb 27 • 16
Consolidating Attention Features for Multi-view Image Editing Paper • 2402.14792 • Published Feb 22 • 7
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis Paper • 2402.14797 • Published Feb 22 • 19
FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition Paper • 2312.07536 • Published Dec 12, 2023 • 16
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling Paper • 2402.12226 • Published Feb 19 • 40
DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation Paper • 2402.11929 • Published Feb 19 • 9
LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing Paper • 2402.10294 • Published Feb 15 • 22
MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices Paper • 2311.16567 • Published Nov 28, 2023 • 22
Boximator: Generating Rich and Controllable Motions for Video Synthesis Paper • 2402.01566 • Published Feb 2 • 26
InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions Paper • 2402.03040 • Published Feb 5 • 17
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion Paper • 2402.03162 • Published Feb 5 • 17
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning Paper • 2402.00769 • Published Feb 1 • 20
Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All Paper • 2401.13795 • Published Jan 24 • 65
EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models Paper • 2401.11739 • Published Jan 22 • 16
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers Paper • 2401.08740 • Published Jan 16 • 11
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis Paper • 2401.09048 • Published Jan 17 • 8
InstructVideo: Instructing Video Diffusion Models with Human Feedback Paper • 2312.12490 • Published Dec 19, 2023 • 17
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation Paper • 2312.03641 • Published Dec 6, 2023 • 20
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model Paper • 2312.02238 • Published Dec 4, 2023 • 25
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence Paper • 2312.02087 • Published Dec 4, 2023 • 20
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation Paper • 2212.11565 • Published Dec 22, 2022 • 3
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation Paper • 2309.15818 • Published Sep 27, 2023 • 19
MotionDirector: Motion Customization of Text-to-Video Diffusion Models Paper • 2310.08465 • Published Oct 12, 2023 • 14