gary109 (蓋瑞王)

upvoted a paper 3 days ago

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Paper • 2404.16130 • Published Apr 24 • 3

upvoted a paper 10 days ago

Video-to-Audio Generation with Hidden Alignment

Paper • 2407.07464 • Published 12 days ago • 11

upvoted a paper 13 days ago

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

Paper • 2407.04051 • Published 17 days ago • 33

upvoted 4 papers 14 days ago

PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation

Paper • 2407.02869 • Published 19 days ago • 15

Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages

Paper • 2407.03321 • Published 18 days ago • 14

Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models

Paper • 2407.01906 • Published 20 days ago • 33

Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

Paper • 2407.01392 • Published 20 days ago • 39

upvoted a paper 19 days ago

We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

Paper • 2407.01284 • Published 20 days ago • 72

upvoted 24 papers 25 days ago

MotionBooth: Motion-Aware Customized Text-to-Video Generation

Paper • 2406.17758 • Published 26 days ago • 18

APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

Paper • 2406.18518 • Published 25 days ago • 22

Repulsive Score Distillation for Diverse Sampling of Diffusion Models

Paper • 2406.16683 • Published 27 days ago • 4

OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?

Paper • 2406.16772 • Published 27 days ago • 2

Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization

Paper • 2406.16008 • Published 29 days ago • 6

IRASim: Learning Interactive Real-Robot Action Simulators

Paper • 2406.14540 • Published Jun 20 • 6

ClotheDreamer: Text-Guided Garment Generation with 3D Gaussians

Paper • 2406.16815 • Published 27 days ago • 7

How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics

Paper • 2406.14051 • Published Jun 20 • 9

Confidence Regulation Neurons in Language Models

Paper • 2406.16254 • Published 28 days ago • 10

AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models

Paper • 2406.16714 • Published 27 days ago • 10

Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers

Paper • 2406.16747 • Published 27 days ago • 16

Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters

Paper • 2406.16758 • Published 27 days ago • 18

WARP: On the Benefits of Weight Averaged Rewarded Policies

Paper • 2406.16768 • Published 27 days ago • 21

Efficient Continual Pre-training by Mitigating the Stability Gap

Paper • 2406.14833 • Published about 1 month ago • 19

Scaling Laws for Linear Complexity Language Models

Paper • 2406.16690 • Published 27 days ago • 21

VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models

Paper • 2406.16338 • Published 28 days ago • 23

Video-Infinity: Distributed Long Video Generation

Paper • 2406.16260 • Published 28 days ago • 28

Long Context Transfer from Language to Vision

Paper • 2406.16852 • Published 27 days ago • 32

Evaluating D-MERIT of Partial-annotation on Information Retrieval

Paper • 2406.16048 • Published 29 days ago • 34

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Paper • 2406.16860 • Published 27 days ago • 52

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Paper • 2406.15877 • Published 29 days ago • 43

upvoted a paper 26 days ago

DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

Paper • 2406.16855 • Published 27 days ago • 53

upvoted 10 papers 27 days ago

Jailbreaking as a Reward Misspecification Problem

Paper • 2406.14393 • Published Jun 20 • 12

Reward Steering with Evolutionary Heuristics for Decoding-time Alignment

Paper • 2406.15193 • Published about 1 month ago • 12

Interface Design for Self-Supervised Speech Models

Paper • 2406.12209 • Published Jun 18 • 6

Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework

Paper • 2406.14783 • Published about 1 month ago • 15

EvTexture: Event-driven Texture Enhancement for Video Super-Resolution

Paper • 2406.13457 • Published Jun 19 • 15

Complexity of Symbolic Representation in Working Memory of Transformer Correlates with the Complexity of a Task

Paper • 2406.14213 • Published Jun 20 • 20

Stylebreeder: Exploring and Democratizing Artistic Styles through Text-to-Image Models

Paper • 2406.14599 • Published Jun 20 • 16

Towards Retrieval Augmented Generation over Large Video Libraries

Paper • 2406.14938 • Published about 1 month ago • 18

LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs

Paper • 2406.15319 • Published about 1 month ago • 57

Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

Paper • 2406.12624 • Published Jun 18 • 35

upvoted 17 papers about 1 month ago

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Paper • 2206.04615 • Published Jun 9, 2022 • 5

Can Large Language Models Be an Alternative to Human Evaluations?

Paper • 2305.01937 • Published May 3, 2023 • 2

VoCo-LLaMA: Towards Vision Compression with Large Language Models

Paper • 2406.12275 • Published Jun 18 • 29

Bootstrapping Language Models with DPO Implicit Rewards

Paper • 2406.09760 • Published Jun 14 • 37

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

Paper • 2406.11931 • Published Jun 17 • 54

TroL: Traversal of Layers for Large Language and Vision Models

Paper • 2406.12246 • Published Jun 18 • 34

GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities

Paper • 2406.11768 • Published Jun 17 • 20

OpenVLA: An Open-Source Vision-Language-Action Model

Paper • 2406.09246 • Published Jun 13 • 30

Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

Paper • 2406.09416 • Published Jun 13 • 28

Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

Paper • 2406.07522 • Published Jun 11 • 35

Transformers meet Neural Algorithmic Reasoners

Paper • 2406.09308 • Published Jun 13 • 43

Depth Anything V2

Paper • 2406.09414 • Published Jun 13 • 88

Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs

Paper • 2406.10209 • Published Jun 14 • 8

Decoding the Diversity: A Review of the Indic AI Research Landscape

Paper • 2406.09559 • Published Jun 13 • 5

MaskLID: Code-Switching Language Identification through Iterative Masking

Paper • 2406.06263 • Published Jun 10 • 5

GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors

Paper • 2406.10111 • Published Jun 14 • 6

Training-free Camera Control for Video Generation

Paper • 2406.10126 • Published Jun 14 • 12

蓋瑞王

AI & ML interests

Organizations

gary109's activity