deepkyu (Hyoung-Kyu Song)

upvoted a paper about 1 month ago

Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

Paper • 2409.02634 • Published Sep 4 • 85

upvoted a paper about 2 months ago

Compact 3D Gaussian Splatting for Static and Dynamic Radiance Fields

Paper • 2408.03822 • Published Aug 7 • 9

upvoted 5 papers 2 months ago

Wolf: Captioning Everything with a World Summarization Framework

Paper • 2407.18908 • Published Jul 26 • 30

HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation

Paper • 2407.17438 • Published Jul 24 • 23

SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency

Paper • 2407.17470 • Published Jul 24 • 14

ViPer: Visual Personalization of Generative Models via Individual Preference Learning

Paper • 2407.17365 • Published Jul 24 • 11

VILA^2: VILA Augmented VILA

Paper • 2407.17453 • Published Jul 24 • 38

upvoted 3 papers 3 months ago

upvoted 3 papers 6 months ago

Pegasus-v1 Technical Report

Paper • 2404.14687 • Published Apr 23 • 30

EdgeFusion: On-Device Text-to-Image Generation

Paper • 2404.11925 • Published Apr 18 • 21

sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28 • 38

upvoted a collection 7 months ago

Transformers.js demos

Collection

A collection of my favorite WebML demos, built with Transformers.js! • 30 items • Updated Jul 11 • 80

upvoted a paper 7 months ago

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Paper • 2402.17485 • Published Feb 27 • 185

upvoted 7 papers 8 months ago

Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

Paper • 2402.14797 • Published Feb 22 • 19

Aria Everyday Activities Dataset

Paper • 2402.13349 • Published Feb 20 • 29

Training-Free Consistent Text-to-Image Generation

Paper • 2402.03286 • Published Feb 5 • 64

InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions

Paper • 2402.03040 • Published Feb 5 • 17

YOLO-World: Real-Time Open-Vocabulary Object Detection

Paper • 2401.17270 • Published Jan 30 • 32

StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis

Paper • 2401.17093 • Published Jan 30 • 18

SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Paper • 2401.15024 • Published Jan 26 • 67

upvoted 3 papers 10 months ago

SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing

Paper • 2312.11392 • Published Dec 18, 2023 • 19

Weight subcloning: direct initialization of transformers using larger pretrained ones

Paper • 2312.09299 • Published Dec 14, 2023 • 17

Cache Me if You Can: Accelerating Diffusion Models through Block Caching

Paper • 2312.03209 • Published Dec 6, 2023 • 17

upvoted a collection 11 months ago

Distil-Whisper Models

Collection

The first version of the Distil-Whisper models released with the Distil-Whisper paper. • 4 items • Updated Mar 21 • 35

upvoted 2 papers 11 months ago

Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling

Paper • 2311.00430 • Published Nov 1, 2023 • 56

Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation

Paper • 2310.18628 • Published Oct 28, 2023 • 7

upvoted 4 papers 12 months ago

Zephyr: Direct Distillation of LM Alignment

Paper • 2310.16944 • Published Oct 25, 2023 • 120

A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation

Paper • 2310.16656 • Published Oct 25, 2023 • 39

Matryoshka Diffusion Models

Paper • 2310.15111 • Published Oct 23, 2023 • 40

SALMONN: Towards Generic Hearing Abilities for Large Language Models

Paper • 2310.13289 • Published Oct 20, 2023 • 17

upvoted a collection 12 months ago

Historical - Spaces of the Week

Collection

All Spaces of the Week...from all weeks • 636 items • Updated Jan 17 • 19

upvoted 3 papers 12 months ago

What the DAAM: Interpreting Stable Diffusion Using Cross Attention

Paper • 2210.04885 • Published Oct 10, 2022 • 1

4K4D: Real-Time 4D View Synthesis at 4K Resolution

Paper • 2310.11448 • Published Oct 17, 2023 • 36

LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models

Paper • 2310.08659 • Published Oct 12, 2023 • 22

upvoted 12 papers about 1 year ago

NeuRBF: A Neural Fields Representation with Adaptive Radial Basis Functions

Paper • 2309.15426 • Published Sep 27, 2023 • 14

RMT: Retentive Networks Meet Vision Transformers

Paper • 2309.11523 • Published Sep 20, 2023 • 33

Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 82

FreeU: Free Lunch in Diffusion U-Net

Paper • 2309.11497 • Published Sep 20, 2023 • 64

Replacing softmax with ReLU in Vision Transformers

Paper • 2309.08586 • Published Sep 15, 2023 • 17

MagiCapture: High-Resolution Multi-Concept Portrait Customization

Paper • 2309.06895 • Published Sep 13, 2023 • 27

Efficient Memory Management for Large Language Model Serving with PagedAttention

Paper • 2309.06180 • Published Sep 12, 2023 • 25

Rethinking Vision Transformers for MobileNet Size and Speed

Paper • 2212.08059 • Published Dec 15, 2022 • 4

FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization

Paper • 2303.14189 • Published Mar 24, 2023 • 3

Tracking Anything in High Quality

Paper • 2307.13974 • Published Jul 26, 2023 • 13

TokenFlow: Consistent Diffusion Features for Consistent Video Editing

Paper • 2307.10373 • Published Jul 19, 2023 • 57

Retentive Network: A Successor to Transformer for Large Language Models

Paper • 2307.08621 • Published Jul 17, 2023 • 170

upvoted 2 papers over 1 year ago

Faster Segment Anything: Towards Lightweight SAM for Mobile Applications

Paper • 2306.14289 • Published Jun 25, 2023 • 15

TryOnDiffusion: A Tale of Two UNets

Paper • 2306.08276 • Published Jun 14, 2023 • 72

Hyoung-Kyu Song PRO

AI & ML interests

Organizations

deepkyu's activity