juandavidgf (juand4bot)

upvoted 8 papers about 2 months ago

UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization

Paper • 2408.05939 • Published Aug 12 • 13

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Paper • 2408.06072 • Published Aug 12 • 35

ControlNeXt: Powerful and Efficient Control for Image and Video Generation

Paper • 2408.06070 • Published Aug 12 • 52

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

Paper • 2408.06292 • Published Aug 12 • 115

upvoted a paper 2 months ago

ShieldGemma: Generative AI Content Moderation Based on Gemma

Paper • 2407.21772 • Published Jul 31 • 13

upvoted 3 papers 3 months ago

Human-like Episodic Memory for Infinite Context LLMs

Paper • 2407.09450 • Published Jul 12 • 56

Long Context Transfer from Language to Vision

Paper • 2406.16852 • Published Jun 24 • 32

Video-Infinity: Distributed Long Video Generation

Paper • 2406.16260 • Published Jun 24 • 28

upvoted 8 papers 4 months ago

VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Paper • 2406.10227 • Published Jun 14 • 9

Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality

Paper • 2406.08845 • Published Jun 13 • 8

Designing a Dashboard for Transparency and Control of Conversational AI

Paper • 2406.07882 • Published Jun 12 • 9

XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning

Paper • 2406.08973 • Published Jun 13 • 85

Make It Count: Text-to-Image Generation with an Accurate Number of Objects

Paper • 2406.10210 • Published Jun 14 • 76

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Paper • 2406.07476 • Published Jun 11 • 32

Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion

Paper • 2406.04338 • Published Jun 6 • 34

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24 • 11

upvoted 3 papers 5 months ago

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

Paper • 2404.16771 • Published Apr 25 • 16

Interactive3D: Create What You Want by Interactive 3D Generation

Paper • 2404.16510 • Published Apr 25 • 18

Make Your LLM Fully Utilize the Context

Paper • 2404.16811 • Published Apr 25 • 52

upvoted 7 papers 6 months ago

Align Your Steps: Optimizing Sampling Schedules in Diffusion Models

Paper • 2404.14507 • Published Apr 22 • 21

SnapKV: LLM Knows What You are Looking for Before Generation

Paper • 2404.14469 • Published Apr 22 • 23

Multi-Head Mixture-of-Experts

Paper • 2404.15045 • Published Apr 23 • 59

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published Apr 22 • 124

FlowMind: Automatic Workflow Generation with LLMs

Paper • 2404.13050 • Published Mar 17 • 32

How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study

Paper • 2404.14047 • Published Apr 22 • 43

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22 • 251

upvoted a collection 6 months ago

HF-curated models available on Workers AI

Collection

A collection of models curated with Hugging Face that can be run on Cloudflare's Workers AI serverless inference platform. • 15 items • Updated Apr 2 • 50

upvoted 3 papers 6 months ago

Streaming Dense Video Captioning

Paper • 2404.01297 • Published Apr 1 • 11

CosmicMan: A Text-to-Image Foundation Model for Humans

Paper • 2404.01294 • Published Apr 1 • 15

Measuring Style Similarity in Diffusion Models

Paper • 2404.01292 • Published Apr 1 • 16

upvoted 26 papers 7 months ago

AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks

Paper • 2403.14468 • Published Mar 21 • 21

IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models

Paper • 2403.13535 • Published Mar 20 • 21

When Do We Not Need Larger Vision Models?

Paper • 2403.13043 • Published Mar 19 • 25

TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation

Paper • 2403.12906 • Published Mar 19 • 4

LightIt: Illumination Modeling and Control for Diffusion Models

Paper • 2403.10615 • Published Mar 15 • 16

Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations

Paper • 2403.09704 • Published Mar 8 • 31

RAFT: Adapting Language Model to Domain Specific RAG

Paper • 2403.10131 • Published Mar 15 • 66

VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis

Paper • 2403.08764 • Published Mar 13 • 34

Gemma: Open Models Based on Gemini Research and Technology

Paper • 2403.08295 • Published Mar 13 • 47

DragAnything: Motion Control for Anything using Entity Representation

Paper • 2403.07420 • Published Mar 12 • 12

V3D: Video Diffusion Models are Effective 3D Generators

Paper • 2403.06738 • Published Mar 11 • 28

VideoMamba: State Space Model for Efficient Video Understanding

Paper • 2403.06977 • Published Mar 11 • 27

Stealing Part of a Production Language Model

Paper • 2403.06634 • Published Mar 11 • 90

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Paper • 2403.05530 • Published Mar 8 • 59

Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation

Paper • 2402.17245 • Published Feb 27 • 10

Sora Generates Videos with Stunning Geometrical Consistency

Paper • 2402.17403 • Published Feb 27 • 16

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27 • 88

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Paper • 2402.17485 • Published Feb 27 • 185

MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT

Paper • 2402.16840 • Published Feb 26 • 23

API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs

Paper • 2402.15491 • Published Feb 23 • 13

Seamless Human Motion Composition with Blended Positional Encodings

Paper • 2402.15509 • Published Feb 23 • 14

Genie: Generative Interactive Environments

Paper • 2402.15391 • Published Feb 23 • 70

Divide-or-Conquer? Which Part Should You Distill Your LLM?

Paper • 2402.15000 • Published Feb 22 • 22

ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

Paper • 2402.15220 • Published Feb 23 • 19

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22 • 108

GaussianPro: 3D Gaussian Splatting with Progressive Propagation

Paper • 2402.14650 • Published Feb 22 • 6

juand4bot

AI & ML interests

Organizations

juandavidgf's activity