MGM Collection Official model collection for the paper "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models" • 13 items • Updated May 3 • 46
view article Article ColPali: Efficient Document Retrieval with Vision Language Models 👀 By manu • 17 days ago • 41
Qwen2 Collection Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. • 29 items • Updated Jun 6 • 257
view article Article Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models 28 days ago • 142
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data Paper • 2401.10891 • Published Jan 19 • 54
DocLLM: A layout-aware generative language model for multimodal document understanding Paper • 2401.00908 • Published Dec 31, 2023 • 178