Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction Paper • 2409.18124 • Published 21 days ago • 29
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second Paper • 2410.02073 • Published 14 days ago • 38
A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation Paper • 2410.01912 • Published 15 days ago • 13
Molmo Collection Artifacts for open multimodal language models. • 5 items • Updated 21 days ago • 250
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated 29 days ago • 256