InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output Paper • 2407.03320 • Published 2 days ago • 69
InternVL 2.0 Collection Expanding Performance Boundaries of Open-Source MLLM • 7 items • Updated 2 days ago • 17
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning Paper • 2406.17770 • Published 10 days ago • 18
AI Paper of the Day Collection A collection of papers that I think are interesting, one added each day • 119 items • Updated 2 days ago • 16
view article Article Claude-3.5 Evaluation Results on Open VLM Leaderboard By KennyUTC • 12 days ago • 4
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding Paper • 2406.14515 • Published 15 days ago • 27
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs Paper • 2406.14544 • Published 15 days ago • 33
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs Paper • 2406.11833 • Published 18 days ago • 61
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Paper • 2406.04325 • Published 29 days ago • 69
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework Paper • 2404.14619 • Published Apr 22 • 124
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD Paper • 2404.06512 • Published Apr 9 • 29
JourneyDB: A Benchmark for Generative Image Understanding Paper • 2307.00716 • Published Jul 3, 2023 • 17
MMBench: Is Your Multi-modal Model an All-around Player? Paper • 2307.06281 • Published Jul 12, 2023 • 3
Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks Paper • 2404.06480 • Published Apr 9 • 1
view article Article Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model Aug 22, 2023 • 14
Are We on the Right Way for Evaluating Large Vision-Language Models? Paper • 2403.20330 • Published Mar 29 • 6
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model Paper • 2401.16420 • Published Jan 29 • 54