taesiri (taesiri)

upvoted 6 papers about 13 hours ago

HEMM: Holistic Evaluation of Multimodal Foundation Models

Paper • 2407.03418 • Published 6 days ago • 5

On scalable oversight with weak LLMs judging strong LLMs

Paper • 2407.04622 • Published 4 days ago • 9

Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks

Paper • 2407.02855 • Published 6 days ago • 6

ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild

Paper • 2407.04172 • Published 5 days ago • 13

AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents

Paper • 2407.04363 • Published 4 days ago • 20

Unveiling Encoder-Free Vision-Language Models

Paper • 2406.11832 • Published 22 days ago • 34

upvoted a paper 2 days ago

Searching for Best Practices in Retrieval-Augmented Generation

Paper • 2407.01219 • Published 8 days ago • 8

upvoted 2 papers 6 days ago

Revealing Fine-Grained Values and Opinions in Large Language Models

Paper • 2406.19238 • Published 12 days ago • 12

Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

Paper • 2407.01370 • Published 8 days ago • 72

upvoted 10 papers 7 days ago

UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI

Paper • 2407.00106 • Published 12 days ago • 5

Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs

Paper • 2407.00653 • Published 9 days ago • 8

Wavelets Are All You Need for Autoregressive Image Generation

Paper • 2406.19997 • Published 11 days ago • 26

OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents

Paper • 2407.00114 • Published 12 days ago • 12

ColPali: Efficient Document Retrieval with Vision Language Models

Paper • 2407.01449 • Published 12 days ago • 24

Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning

Paper • 2407.00782 • Published 9 days ago • 21

LiteSearch: Efficacious Tree Search for LLM

Paper • 2407.00320 • Published 10 days ago • 34

MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation

Paper • 2407.00468 • Published 10 days ago • 35

ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

Paper • 2406.19741 • Published 11 days ago • 54

We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

Paper • 2407.01284 • Published 8 days ago • 69

upvoted 3 papers 8 days ago

Improved Baselines with Visual Instruction Tuning

Paper • 2310.03744 • Published Oct 5, 2023 • 34

HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale

Paper • 2406.19280 • Published 12 days ago • 55

Scaling Synthetic Data Creation with 1,000,000,000 Personas

Paper • 2406.20094 • Published 11 days ago • 81

upvoted 5 papers 10 days ago

AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models

Paper • 2406.10900 • Published 23 days ago • 11

Dataset Size Recovery from LoRA Weights

Paper • 2406.19395 • Published 12 days ago • 17

Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding

Paper • 2406.19263 • Published 12 days ago • 9

Is Programming by Example solved by LLMs?

Paper • 2406.08316 • Published 27 days ago • 11

Aligning Teacher with Student Preferences for Tailored Training Data Generation

Paper • 2406.19227 • Published 12 days ago • 22

upvoted a paper 11 days ago

OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published 12 days ago • 51

upvoted 4 papers 12 days ago

Investigating Data Contamination in Modern Benchmarks for Large Language Models

Paper • 2311.09783 • Published Nov 16, 2023 • 2

Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

Paper • 2406.17294 • Published 14 days ago • 9

CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs

Paper • 2406.18521 • Published 13 days ago • 25

Adam-mini: Use Fewer Learning Rates To Gain More

Paper • 2406.16793 • Published 15 days ago • 63

upvoted 3 papers 13 days ago

upvoted a paper 14 days ago

VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models

Paper • 2406.16338 • Published 15 days ago • 23

upvoted a paper 15 days ago

Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

Paper • 2406.12624 • Published 21 days ago • 34

upvoted a collection 17 days ago

4M Models

Collection

Multimodal models from https://4m.epfl.ch/ • 14 items • Updated 24 days ago • 29

upvoted 6 papers 18 days ago

Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities

Paper • 2406.14562 • Published 19 days ago • 27

REPOEXEC: Evaluate Code Generation with a Repository-Level Executable Benchmark

Paper • 2406.11927 • Published 22 days ago • 8

DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning

Paper • 2406.11896 • Published 25 days ago • 17

PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

Paper • 2406.13923 • Published 19 days ago • 21

MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding

Paper • 2406.14515 • Published 19 days ago • 28

Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs

Paper • 2406.14544 • Published 19 days ago • 33

upvoted 3 papers 19 days ago

Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models

Paper • 2406.12649 • Published 21 days ago • 15

Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models

Paper • 2406.11230 • Published 22 days ago • 34

Long Code Arena: a Set of Benchmarks for Long-Context Code Models

Paper • 2406.11612 • Published 22 days ago • 20

upvoted 5 papers 20 days ago

AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology

Paper • 2406.11912 • Published 23 days ago • 25

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

Paper • 2406.11931 • Published 22 days ago • 54

Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning

Paper • 2406.12742 • Published 21 days ago • 14

VoCo-LLaMA: Towards Vision Compression with Large Language Models

Paper • 2406.12275 • Published 21 days ago • 28

DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published 22 days ago • 39

upvoted a paper 21 days ago

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

Paper • 2406.11069 • Published 23 days ago • 11

upvoted 7 papers 22 days ago

VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Paper • 2406.10227 • Published 25 days ago • 8

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages

Paper • 2406.10118 • Published 25 days ago • 25

GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

Paper • 2406.08451 • Published 27 days ago • 23

Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering

Paper • 2406.10208 • Published 25 days ago • 21

BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack

Paper • 2406.10149 • Published 25 days ago • 47

ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation

Paper • 2406.09961 • Published 25 days ago • 54

XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning

Paper • 2406.08973 • Published 26 days ago • 85

taesiri PRO

AI & ML interests

Organizations

taesiri's activity