TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices Paper • 2410.00531 • Published 4 days ago • 27
Law of the Weakest Link: Cross Capabilities of Large Language Models Paper • 2409.19951 • Published 5 days ago • 46
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Paper • 2409.20566 • Published 4 days ago • 41
Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation Paper • 2409.12941 • Published 15 days ago • 20
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models Paper • 2409.17481 • Published 9 days ago • 43
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models Paper • 2409.16191 • Published 10 days ago • 40
YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models Paper • 2409.13592 • Published 14 days ago • 45
Imagine yourself: Tuning-Free Personalized Image Generation Paper • 2409.13346 • Published 15 days ago • 65
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published 15 days ago • 127
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published 16 days ago • 35
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published 16 days ago • 69
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion Paper • 2409.11406 • Published 17 days ago • 23
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think Paper • 2409.11355 • Published 17 days ago • 26
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval Paper • 2409.10516 • Published 18 days ago • 33
InstantDrag: Improving Interactivity in Drag-based Image Editing Paper • 2409.08857 • Published 21 days ago • 30
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? Paper • 2409.07703 • Published 23 days ago • 62
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale Paper • 2409.08264 • Published 22 days ago • 42
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers Paper • 2409.04109 • Published 29 days ago • 39
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation Paper • 2409.06820 • Published 24 days ago • 59
LLaMA-Omni: Seamless Speech Interaction with Large Language Models Paper • 2409.06666 • Published 24 days ago • 54
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct Paper • 2409.05840 • Published 25 days ago • 45
Towards a Unified View of Preference Learning for Large Language Models: A Survey Paper • 2409.02795 • Published about 1 month ago • 71
Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing Paper • 2409.01322 • Published Sep 2 • 95
Improve Mathematical Reasoning in Language Models by Automated Process Supervision Paper • 2406.06592 • Published Jun 5 • 23
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling Paper • 2409.16160 • Published 10 days ago • 30
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published 9 days ago • 89
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 11 items • Updated 9 days ago • 322
view article Article Optimize and deploy models with Optimum-Intel and OpenVINO GenAI 15 days ago • 14
Qwen2.5-Math Collection Math-specific model series based on Qwen2.5 • 9 items • Updated 12 days ago • 34
Qwen2.5-Coder Collection Code-specific model series based on Qwen2.5 • 14 items • Updated 9 days ago • 69
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated 16 days ago • 220
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated 16 days ago • 201
Llama3-8B-1.58 Collection A trio of powerful models: fine-tuned from Llama3-8b-Instruct, with BitNet architecture! • 3 items • Updated 20 days ago • 9
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation Paper • 2409.09214 • Published 21 days ago • 44
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning Paper • 2406.12050 • Published Jun 17 • 17
DataGemma Release Collection A series of pioneering open models that help ground LLMs in real-world data through Data Commons. • 2 items • Updated 22 days ago • 76
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published Sep 3 • 78
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture Paper • 2409.02889 • Published about 1 month ago • 54
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark Paper • 2409.02813 • Published about 1 month ago • 27
LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models Paper • 2409.00509 • Published Aug 31 • 38
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published Aug 28 • 83
Writing in the Margins: Better Inference Pattern for Long Context Retrieval Paper • 2408.14906 • Published Aug 27 • 137