Rui Zhao's picture

Rui Zhao

ruizhaocv

·

https://ruizhaocv.github.io/

AI & ML interests

Multimodal and AIGC

Organizations

ruizhaocv's activity

upvoted a paper 1 day ago

EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models

Paper • 2410.07133 • Published 6 days ago • 17

upvoted 3 papers 2 days ago

Pixtral 12B

Paper • 2410.07073 • Published 6 days ago • 54

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

Paper • 2410.05363 • Published 8 days ago • 42

Personalized Visual Instruction Tuning

Paper • 2410.07113 • Published 6 days ago • 65

upvoted 2 papers about 2 months ago

Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27 • 121

Training-free Long Video Generation with Chain of Diffusion Model Experts

Paper • 2408.13423 • Published Aug 24 • 20

upvoted 2 papers 7 months ago

DragAnything: Motion Control for Anything using Entity Representation

Paper • 2403.07420 • Published Mar 12 • 12

VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models

Paper • 2403.06098 • Published Mar 10 • 15

upvoted 17 papers 8 months ago

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Paper • 2402.19479 • Published Feb 29 • 32

DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model

Paper • 2402.17412 • Published Feb 27 • 21

Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

Paper • 2402.17723 • Published Feb 27 • 16

Video as the New Language for Real-World Decision Making

Paper • 2402.17139 • Published Feb 27 • 18

Consolidating Attention Features for Multi-view Image Editing

Paper • 2402.14792 • Published Feb 22 • 7

Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

Paper • 2402.14797 • Published Feb 22 • 19

FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition

Paper • 2312.07536 • Published Dec 12, 2023 • 16

FiT: Flexible Vision Transformer for Diffusion Model

Paper • 2402.12376 • Published Feb 19 • 48

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

Paper • 2402.12226 • Published Feb 19 • 40

DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation

Paper • 2402.11929 • Published Feb 19 • 9

LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing

Paper • 2402.10294 • Published Feb 15 • 22

MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices

Paper • 2311.16567 • Published Nov 28, 2023 • 22

Magic-Me: Identity-Specific Video Customized Diffusion

Paper • 2402.09368 • Published Feb 14 • 26

Training-Free Consistent Text-to-Image Generation

Paper • 2402.03286 • Published Feb 5 • 64

Boximator: Generating Rich and Controllable Motions for Video Synthesis

Paper • 2402.01566 • Published Feb 2 • 26

InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions

Paper • 2402.03040 • Published Feb 5 • 17

Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion

Paper • 2402.03162 • Published Feb 5 • 17

upvoted 10 papers 9 months ago

AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning

Paper • 2402.00769 • Published Feb 1 • 20

Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All

Paper • 2401.13795 • Published Jan 24 • 65

Synthesizing Moving People with 3D Control

Paper • 2401.10889 • Published Jan 19 • 12

ActAnywhere: Subject-Aware Video Background Generation

Paper • 2401.10822 • Published Jan 19 • 13

Make-A-Shape: a Ten-Million-scale 3D Shape Model

Paper • 2401.11067 • Published Jan 20 • 15

EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models

Paper • 2401.11739 • Published Jan 22 • 16

Towards A Better Metric for Text-to-Video Generation

Paper • 2401.07781 • Published Jan 15 • 14

SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers

Paper • 2401.08740 • Published Jan 16 • 11

DiffusionGPT: LLM-Driven Text-to-Image Generation System

Paper • 2401.10061 • Published Jan 18 • 27

Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis

Paper • 2401.09048 • Published Jan 17 • 8

upvoted 3 papers 10 months ago

InstructVideo: Instructing Video Diffusion Models with Human Feedback

Paper • 2312.12490 • Published Dec 19, 2023 • 17

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

Paper • 2312.03641 • Published Dec 6, 2023 • 20

X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model

Paper • 2312.02238 • Published Dec 4, 2023 • 25

upvoted a paper 11 months ago

VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence

Paper • 2312.02087 • Published Dec 4, 2023 • 20

upvoted 3 papers about 1 year ago

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Paper • 2212.11565 • Published Dec 22, 2022 • 3

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

Paper • 2309.15818 • Published Sep 27, 2023 • 19

MotionDirector: Motion Customization of Text-to-Video Diffusion Models

Paper • 2310.08465 • Published Oct 12, 2023 • 14