Shape of Motion: 4D Reconstruction from a Single Video Paper • 2407.13764 • Published 5 days ago • 14
DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation Paper • 2407.11394 • Published 8 days ago • 10
Scaling Diffusion Transformers to 16 Billion Parameters Paper • 2407.11633 • Published 7 days ago • 21
Generalizable Implicit Motion Modeling for Video Frame Interpolation Paper • 2407.08680 • Published 12 days ago • 7
OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects Paper • 2407.08711 • Published 12 days ago • 5
VEnhancer: Generative Space-Time Enhancement for Video Generation Paper • 2407.07667 • Published 13 days ago • 8
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively Paper • 2401.02955 • Published Jan 5 • 18
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos Paper • 2406.08407 • Published Jun 12 • 24
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound Paper • 2406.06612 • Published Jun 6 • 13
MagicPose4D: Crafting Articulated Models with Appearance and Motion Control Paper • 2405.14017 • Published May 22 • 2
Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion Paper • 2406.03184 • Published Jun 5 • 18
V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation Paper • 2406.02511 • Published Jun 4 • 8
Learning Temporally Consistent Video Depth from Video Diffusion Priors Paper • 2406.01493 • Published Jun 3 • 17
4Diffusion: Multi-view Video Diffusion Model for 4D Generation Paper • 2405.20674 • Published May 31 • 10
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback Paper • 2405.18750 • Published May 29 • 20
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning Paper • 2405.18386 • Published May 28 • 18
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control Paper • 2405.17414 • Published May 27 • 10
Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition Paper • 2405.15216 • Published May 24 • 11
Part123: Part-aware 3D Reconstruction from a Single-view Image Paper • 2405.16888 • Published May 27 • 10
Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer Paper • 2405.17405 • Published May 27 • 14
Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels Paper • 2405.16822 • Published May 27 • 11
I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models Paper • 2405.16537 • Published May 26 • 15
Looking Backward: Streaming Video-to-Video Translation with Feature Banks Paper • 2405.15757 • Published May 24 • 14
Look Once to Hear: Target Speech Hearing with Noisy Examples Paper • 2405.06289 • Published May 10 • 3
Neural Directional Encoding for Efficient and Accurate View-Dependent Appearance Modeling Paper • 2405.14847 • Published May 23 • 6
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data Paper • 2405.14333 • Published May 23 • 29
Tele-Aloha: A Low-budget and High-authenticity Telepresence System Using Sparse RGB Cameras Paper • 2405.14866 • Published May 23 • 5
Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices Paper • 2405.12211 • Published May 20 • 1
Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching Paper • 2405.11252 • Published May 18 • 11
Imp: Highly Capable Large Multimodal Models for Mobile Devices Paper • 2405.12107 • Published May 20 • 23
FIFO-Diffusion: Generating Infinite Videos from Text without Training Paper • 2405.11473 • Published May 19 • 53
INDUS: Effective and Efficient Language Models for Scientific Applications Paper • 2405.10725 • Published May 17 • 30
TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction Paper • 2405.10315 • Published May 16 • 9
Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion Paper • 2405.09874 • Published May 16 • 15
CAT3D: Create Anything in 3D with Multi-View Diffusion Models Paper • 2405.10314 • Published May 16 • 40
Compositional Text-to-Image Generation with Dense Blob Representations Paper • 2405.08246 • Published May 14 • 11
Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning Paper • 2405.08054 • Published May 13 • 21
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots Paper • 2405.07990 • Published May 13 • 15
LogoMotion: Visually Grounded Code Generation for Content-Aware Animation Paper • 2405.07065 • Published May 11 • 16
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation Paper • 2405.01434 • Published May 2 • 50
Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting Paper • 2404.19758 • Published Apr 30 • 10
DressCode: Autoregressively Sewing and Generating Garments from Text Guidance Paper • 2401.16465 • Published Jan 29 • 10
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation Paper • 2404.19427 • Published Apr 30 • 71