phillipinseoul (Yuseung "Phillip" Lee)

upvoted 3 papers 5 days ago

upvoted a paper 7 days ago

Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation

Paper • 2409.03718 • Published 11 days ago • 23

upvoted a paper 11 days ago

LinFusion: 1 GPU, 1 Minute, 16K Image

Paper • 2409.02097 • Published 13 days ago • 31

upvoted a paper 12 days ago

Compositional 3D-aware Video Generation with LLM Director

Paper • 2409.00558 • Published 15 days ago • 14

upvoted a paper 13 days ago

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published 25 days ago • 109

upvoted 3 papers 17 days ago

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Paper • 2408.15998 • Published 19 days ago • 80

ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model

Paper • 2408.16767 • Published 18 days ago • 28

Law of Vision Representation in MLLMs

Paper • 2408.16357 • Published 18 days ago • 91

upvoted a paper 18 days ago

Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published 20 days ago • 118

upvoted a paper 19 days ago

Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

Paper • 2408.15239 • Published 20 days ago • 27

upvoted a paper 20 days ago

SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher

Paper • 2408.14176 • Published 21 days ago • 58

upvoted 2 papers 21 days ago

Sapiens: Foundation for Human Vision Models

Paper • 2408.12569 • Published 25 days ago • 84

LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation

Paper • 2408.13252 • Published 24 days ago • 23

upvoted a paper 24 days ago

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Paper • 2408.12528 • Published 25 days ago • 50

upvoted 2 papers 26 days ago

MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning

Paper • 2408.11001 • Published 27 days ago • 11

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published 27 days ago • 53

upvoted a paper 27 days ago

MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model

Paper • 2408.10198 • Published 28 days ago • 32

upvoted 2 papers 28 days ago

Generative Photomontage

Paper • 2408.07116 • Published Aug 13 • 19

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Paper • 2408.08872 • Published about 1 month ago • 96

upvoted 8 papers about 1 month ago

UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling

Paper • 2408.04810 • Published Aug 9 • 22

Task-oriented Sequential Grounding in 3D Scenes

Paper • 2408.04034 • Published Aug 7 • 8

Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models

Paper • 2408.04594 • Published Aug 8 • 14

Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics

Paper • 2408.04631 • Published Aug 8 • 8

An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion

Paper • 2408.03178 • Published Aug 6 • 34

MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization

Paper • 2408.02555 • Published Aug 5 • 27

TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling

Paper • 2408.01291 • Published Aug 2 • 11

SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement

Paper • 2408.00653 • Published Aug 1 • 27

upvoted 9 papers about 2 months ago

FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention

Paper • 2407.19918 • Published Jul 29 • 47

Theia: Distilling Diverse Vision Foundation Models for Robot Learning

Paper • 2407.20179 • Published Jul 29 • 45

SHIC: Shape-Image Correspondences with no Keypoint Supervision

Paper • 2407.18907 • Published Jul 26 • 38

SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency

Paper • 2407.17470 • Published Jul 24 • 14

HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions

Paper • 2407.15187 • Published Jul 21 • 10

KAN or MLP: A Fairer Comparison

Paper • 2407.16674 • Published Jul 23 • 41

T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation

Paper • 2407.14505 • Published Jul 19 • 24

Temporal Residual Jacobians For Rig-free Motion Transfer

Paper • 2407.14958 • Published Jul 20 • 5

PlacidDreamer: Advancing Harmony in Text-to-3D Generation

Paper • 2407.13976 • Published Jul 19 • 5

upvoted 19 papers 2 months ago

VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control

Paper • 2407.12781 • Published Jul 17 • 12

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Paper • 2407.12772 • Published Jul 17 • 32

DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation

Paper • 2407.11394 • Published Jul 16 • 11

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Paper • 2407.04842 • Published Jul 5 • 52

VIMI: Grounding Video Generation through Multi-modal Instruction

Paper • 2407.06304 • Published Jul 8 • 9

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

Paper • 2407.08083 • Published Jul 10 • 27

Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing

Paper • 2407.08770 • Published Jul 11 • 19

Video Diffusion Alignment via Reward Gradients

Paper • 2407.08737 • Published Jul 11 • 47

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Paper • 2406.16860 • Published Jun 24 • 54

OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation

Paper • 2407.02371 • Published Jul 2 • 49

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Paper • 2407.07895 • Published Jul 10 • 40

Controlling Space and Time with Diffusion Models

Paper • 2407.07860 • Published Jul 10 • 16

PaliGemma: A versatile 3B VLM for transfer

Paper • 2407.07726 • Published Jul 10 • 64

Compositional Video Generation as Flow Equalization

Paper • 2407.06182 • Published Jun 10 • 12

Vision language models are blind

Paper • 2407.06581 • Published Jul 9 • 80

Unveiling Encoder-Free Vision-Language Models

Paper • 2406.11832 • Published Jun 17 • 49

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Paper • 2407.03320 • Published Jul 3 • 92

Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

Paper • 2407.01392 • Published Jul 1 • 39

No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models

Paper • 2407.02687 • Published Jul 2 • 22

upvoted 3 papers 3 months ago

Consistency Flow Matching: Defining Straight Flows with Velocity Consistency

Paper • 2407.02398 • Published Jul 2 • 14

Neural Gaffer: Relighting Any Object via Diffusion

Paper • 2406.07520 • Published Jun 11 • 4

Depth Anything V2

Paper • 2406.09414 • Published Jun 13 • 91

Yuseung "Phillip" Lee

AI & ML interests

Organizations

phillipinseoul's activity