Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2406.04325

about 23 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6 • 21
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6 • 9
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7 • 33
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7 • 19

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 69
SF-V: Single Forward Video Generation Model

Paper • 2406.04324 • Published Jun 6 • 22
VideoTetris: Towards Compositional Text-to-Video Generation

Paper • 2406.04277 • Published Jun 6 • 21
Vript: A Video Is Worth Thousands of Words

Paper • 2406.06040 • Published 29 days ago • 19

Video Generation

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 69

📑 Paper of the month - June 2024

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 69
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs

Paper • 2406.11833 • Published 21 days ago • 61
Depth Anything V2

Paper • 2406.09414 • Published 25 days ago • 88
Instruction Pre-Training: Language Models are Supervised Multitask Learners

Paper • 2406.14491 • Published 19 days ago • 76

Video Understanding

Vript: A Video Is Worth Thousands of Words

Paper • 2406.06040 • Published 29 days ago • 19
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 69
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Paper • 2406.01574 • Published Jun 3 • 42
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Paper • 2405.21075 • Published May 31 • 15

Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

Paper • 2406.06469 • Published 29 days ago • 22
Mixture-of-Agents Enhances Large Language Model Capabilities

Paper • 2406.04692 • Published Jun 7 • 50
CRAG -- Comprehensive RAG Benchmark

Paper • 2406.04744 • Published Jun 7 • 38
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 69

Video-Gen LLM-based

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 69
Pandora: Towards General World Model with Natural Language Actions and Video States

Paper • 2406.09455 • Published 26 days ago • 13

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 69
SF-V: Single Forward Video Generation Model

Paper • 2406.04324 • Published Jun 6 • 22
I4VGen: Image as Stepping Stone for Text-to-Video Generation

Paper • 2406.02230 • Published Jun 4 • 15

VideoTetris: Towards Compositional Text-to-Video Generation

Paper • 2406.04277 • Published Jun 6 • 21
SF-V: Single Forward Video Generation Model

Paper • 2406.04324 • Published Jun 6 • 22
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 69

LanguageBind/MoE-LLaVA-Phi2-2.7B-4e

Text Generation • Updated Feb 1 • 676 • 37
LanguageBind/LanguageBind_Video_FT

Zero-Shot Image Classification • Updated Feb 1 • 166k • 3
stabilityai/stable-video-diffusion-img2vid-xt

Image-to-Video • Updated Apr 29 • 200k • 2.38k
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 69

Previous
1
2
3
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs