ucyang (Unchun Yang)

upvoted a collection 1 day ago

NuExtract

Collection

4 items • Updated Jul 15 • 9

upvoted 3 papers 1 day ago

upvoted a paper 4 days ago

Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published 7 days ago • 71

upvoted a paper 5 days ago

Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation

Paper • 2409.12941 • Published 15 days ago • 20

upvoted 9 papers 6 days ago

MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models

Paper • 2409.17481 • Published 9 days ago • 43

HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models

Paper • 2409.16191 • Published 10 days ago • 40

YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models

Paper • 2409.13592 • Published 14 days ago • 45

Imagine yourself: Tuning-Free Personalized Image Generation

Paper • 2409.13346 • Published 15 days ago • 65

Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published 15 days ago • 127

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Paper • 2409.12183 • Published 16 days ago • 35

LLMs + Persona-Plug = Personalized LLMs

Paper • 2409.11901 • Published 16 days ago • 30

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published 16 days ago • 69

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published 16 days ago • 120

upvoted a collection 6 days ago

Flow-Judge-v0.1

Collection

Flow-Judge-v0.1 models • 5 items • Updated 17 days ago • 15

upvoted 16 papers 6 days ago

Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

Paper • 2409.11406 • Published 17 days ago • 23

Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Paper • 2409.11355 • Published 17 days ago • 26

NVLM: Open Frontier-Class Multimodal LLMs

Paper • 2409.11402 • Published 17 days ago • 66

OmniGen: Unified Image Generation

Paper • 2409.11340 • Published 17 days ago • 80

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

Paper • 2409.10516 • Published 18 days ago • 33

InstantDrag: Improving Interactivity in Drag-based Image Editing

Paper • 2409.08857 • Published 21 days ago • 30

DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?

Paper • 2409.07703 • Published 23 days ago • 62

Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Paper • 2409.08264 • Published 22 days ago • 42

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers

Paper • 2409.04109 • Published 29 days ago • 39

PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation

Paper • 2409.06820 • Published 24 days ago • 59

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

Paper • 2409.06666 • Published 24 days ago • 54

MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct

Paper • 2409.05840 • Published 25 days ago • 45

Towards a Unified View of Preference Learning for Large Language Models: A Survey

Paper • 2409.02795 • Published about 1 month ago • 71

Attention Heads of Large Language Models: A Survey

Paper • 2409.03752 • Published 29 days ago • 85

Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing

Paper • 2409.01322 • Published Sep 2 • 95

Self-Taught Evaluators

Paper • 2408.02666 • Published Aug 5 • 25

upvoted 3 papers 8 days ago

Improve Mathematical Reasoning in Language Models by Automated Process Supervision

Paper • 2406.06592 • Published Jun 5 • 23

MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling

Paper • 2409.16160 • Published 10 days ago • 30

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published 9 days ago • 89

upvoted a collection 8 days ago

Molmo

Collection

Artifacts for open multimodal language models. • 5 items • Updated 9 days ago • 216

upvoted a collection 9 days ago

Llama 3.2

Collection

This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 11 items • Updated 9 days ago • 322

upvoted an article 11 days ago

Article

Llama-3.1 8B Carrot - Capx AI

By

•

28 days ago

• 2

upvoted an article 12 days ago

Article

Optimize and deploy models with Optimum-Intel and OpenVINO GenAI

15 days ago

• 14

upvoted 5 collections 16 days ago

Qwen2.5-Math

Collection

Math-specific model series based on Qwen2.5 • 9 items • Updated 12 days ago • 34

Qwen2.5-Coder

Collection

Code-specific model series based on Qwen2.5 • 14 items • Updated 9 days ago • 69

Qwen2.5

Collection

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated 16 days ago • 220

Moshi v0.1 Release

Collection

MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated 16 days ago • 201

Llama3-8B-1.58

Collection

A trio of powerful models: fine-tuned from Llama3-8b-Instruct, with BitNet architecture! • 3 items • Updated 20 days ago • 9

upvoted an article 16 days ago

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

17 days ago

• 144

upvoted a paper 16 days ago

Let's Verify Step by Step

Paper • 2305.20050 • Published May 31, 2023 • 9

upvoted 2 papers 18 days ago

Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

Paper • 2409.09214 • Published 21 days ago • 44

Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning

Paper • 2406.12050 • Published Jun 17 • 17

upvoted a collection 22 days ago

DataGemma Release

Collection

A series of pioneering open models that help ground LLMs in real-world data through Data Commons. • 2 items • Updated 22 days ago • 76

upvoted a paper 23 days ago

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Paper • 2409.01704 • Published Sep 3 • 78

upvoted a collection 24 days ago

DeepSeek-V2.5

Collection

1 item • Updated 28 days ago • 22

upvoted 9 papers 25 days ago

LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture

Paper • 2409.02889 • Published about 1 month ago • 54

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Paper • 2409.02813 • Published about 1 month ago • 27

LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models

Paper • 2409.00509 • Published Aug 31 • 38

Kvasir-VQA: A Text-Image Pair GI Tract Dataset

Paper • 2409.01437 • Published Sep 2 • 70

OLMoE: Open Mixture-of-Experts Language Models

Paper • 2409.02060 • Published Sep 3 • 76

Law of Vision Representation in MLLMs

Paper • 2408.16357 • Published Aug 29 • 92

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Paper • 2408.15998 • Published Aug 28 • 83

Writing in the Margins: Better Inference Pattern for Long Context Retrieval

Paper • 2408.14906 • Published Aug 27 • 137

Foundation Models for Music: A Survey

Paper • 2408.14340 • Published Aug 26 • 38

Unchun Yang

AI & ML interests

Organizations

ucyang's activity

Llama-3.1 8B Carrot - Capx AI

Optimize and deploy models with Optimum-Intel and OpenVINO GenAI

Fine-tuning LLMs to 1.58bit: extreme quantization made easy