William Lamkin's picture

William Lamkin

phanes

·

WilliamLamkin

AI & ML interests

None yet

Organizations

phanes's activity

upvoted a paper 4 days ago

Shape of Motion: 4D Reconstruction from a Single Video

Paper • 2407.13764 • Published 5 days ago • 14

upvoted 3 papers 6 days ago

Grasping Diverse Objects with Simulated Humanoids

Paper • 2407.11385 • Published 8 days ago • 5

DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation

Paper • 2407.11394 • Published 8 days ago • 10

Scaling Diffusion Transformers to 16 Billion Parameters

Paper • 2407.11633 • Published 7 days ago • 21

upvoted a paper 7 days ago

GRUtopia: Dream General Robots in a City at Scale

Paper • 2407.10943 • Published 8 days ago • 20

upvoted a paper 9 days ago

Generalizable Implicit Motion Modeling for Video Frame Interpolation

Paper • 2407.08680 • Published 12 days ago • 7

upvoted 2 papers 10 days ago

WildGaussians: 3D Gaussian Splatting in the Wild

Paper • 2407.08447 • Published 12 days ago • 8

OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects

Paper • 2407.08711 • Published 12 days ago • 5

upvoted 3 papers 12 days ago

Video-to-Audio Generation with Hidden Alignment

Paper • 2407.07464 • Published 13 days ago • 11

VEnhancer: Generative Space-Time Enhancement for Video Generation

Paper • 2407.07667 • Published 13 days ago • 8

Controlling Space and Time with Diffusion Models

Paper • 2407.07860 • Published 13 days ago • 15

upvoted a paper 21 days ago

Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively

Paper • 2401.02955 • Published Jan 5 • 18

upvoted 3 papers about 1 month ago

Depth Anything V2

Paper • 2406.09414 • Published Jun 13 • 88

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

Paper • 2406.08407 • Published Jun 12 • 24

SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound

Paper • 2406.06612 • Published Jun 6 • 13

upvoted 18 papers about 2 months ago

SF-V: Single Forward Video Generation Model

Paper • 2406.04324 • Published Jun 6 • 22

MagicPose4D: Crafting Articulated Models with Appearance and Motion Control

Paper • 2405.14017 • Published May 22 • 2

Searching Priors Makes Text-to-Video Synthesis Better

Paper • 2406.03215 • Published Jun 5 • 11

Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion

Paper • 2406.03184 • Published Jun 5 • 18

V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation

Paper • 2406.02511 • Published Jun 4 • 8

Learning Temporally Consistent Video Depth from Video Diffusion Priors

Paper • 2406.01493 • Published Jun 3 • 17

4Diffusion: Multi-view Video Diffusion Model for 4D Generation

Paper • 2405.20674 • Published May 31 • 10

T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback

Paper • 2405.18750 • Published May 29 • 20

Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning

Paper • 2405.18386 • Published May 28 • 18

Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

Paper • 2405.17414 • Published May 27 • 10

Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition

Paper • 2405.15216 • Published May 24 • 11

Part123: Part-aware 3D Reconstruction from a Single-view Image

Paper • 2405.16888 • Published May 27 • 10

Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27 • 29

Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer

Paper • 2405.17405 • Published May 27 • 14

Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels

Paper • 2405.16822 • Published May 27 • 11

I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models

Paper • 2405.16537 • Published May 26 • 15

Looking Backward: Streaming Video-to-Video Translation with Feature Banks

Paper • 2405.15757 • Published May 24 • 14

Look Once to Hear: Target Speech Hearing with Noisy Examples

Paper • 2405.06289 • Published May 10 • 3

upvoted 22 papers 2 months ago

Neural Directional Encoding for Efficient and Accurate View-Dependent Appearance Modeling

Paper • 2405.14847 • Published May 23 • 6

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Paper • 2405.14333 • Published May 23 • 29

Tele-Aloha: A Low-budget and High-authenticity Telepresence System Using Sparse RGB Cameras

Paper • 2405.14866 • Published May 23 • 5

ReVideo: Remake a Video with Motion and Content Control

Paper • 2405.13865 • Published May 22 • 22

Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices

Paper • 2405.12211 • Published May 20 • 1

Octo: An Open-Source Generalist Robot Policy

Paper • 2405.12213 • Published May 20 • 22

Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching

Paper • 2405.11252 • Published May 18 • 11

Imp: Highly Capable Large Multimodal Models for Mobile Devices

Paper • 2405.12107 • Published May 20 • 23

FIFO-Diffusion: Generating Infinite Videos from Text without Training

Paper • 2405.11473 • Published May 19 • 53

Grounded 3D-LLM with Referent Tokens

Paper • 2405.10370 • Published May 16 • 9

INDUS: Effective and Efficient Language Models for Scientific Applications

Paper • 2405.10725 • Published May 17 • 30

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 118

TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction

Paper • 2405.10315 • Published May 16 • 9

Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion

Paper • 2405.09874 • Published May 16 • 15

Toon3D: Seeing Cartoons from a New Perspective

Paper • 2405.10320 • Published May 16 • 19

CAT3D: Create Anything in 3D with Multi-View Diffusion Models

Paper • 2405.10314 • Published May 16 • 40

Compositional Text-to-Image Generation with Dense Blob Representations

Paper • 2405.08246 • Published May 14 • 11

Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning

Paper • 2405.08054 • Published May 13 • 21

Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots

Paper • 2405.07990 • Published May 13 • 15

SUTRA: Scalable Multilingual Language Model Architecture

Paper • 2405.06694 • Published May 7 • 36

Large Language Models as Planning Domain Generators

Paper • 2405.06650 • Published Apr 2 • 8

LogoMotion: Visually Grounded Code Generation for Content-Aware Animation

Paper • 2405.07065 • Published May 11 • 16

upvoted 5 papers 3 months ago

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

Paper • 2405.01434 • Published May 2 • 50

Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting

Paper • 2404.19758 • Published Apr 30 • 10

DressCode: Autoregressively Sewing and Generating Garments from Text Guidance

Paper • 2401.16465 • Published Jan 29 • 10

SAGS: Structure-Aware 3D Gaussian Splatting

Paper • 2404.19149 • Published Apr 29 • 13

InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

Paper • 2404.19427 • Published Apr 30 • 71