Byung-Kwan Lee's picture

Byung-Kwan Lee

BK-Lee

·

https://sites.google.com/view/deepvisionresearcher

ByungKwanLee

AI & ML interests

Computer Vision, Machine Learning, Large Language and Vision Models, Efficient Modeling

Organizations

BK-Lee's activity

upvoted a paper 1 day ago

EVLM: An Efficient Vision-Language Model for Visual Understanding

Paper • 2407.14177 • Published 4 days ago • 31

upvoted a paper 5 days ago

Understanding Reference Policies in Direct Preference Optimization

Paper • 2407.13709 • Published 5 days ago • 12

upvoted a paper 7 days ago

Qwen2 Technical Report

Paper • 2407.10671 • Published 8 days ago • 142

upvoted a paper 11 days ago

MAVIS: Mathematical Visual Instruction Tuning

Paper • 2407.08739 • Published 12 days ago • 27

upvoted a paper 15 days ago

Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Knowledge

Paper • 2407.03958 • Published 19 days ago • 15

upvoted a paper 29 days ago

Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters

Paper • 2406.16758 • Published 29 days ago • 18

upvoted 3 papers about 1 month ago

TroL: Traversal of Layers for Large Language and Vision Models

Paper • 2406.12246 • Published Jun 18 • 34

Depth Anything V2

Paper • 2406.09414 • Published Jun 13 • 88

An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13 • 48

upvoted 5 papers about 2 months ago

Phased Consistency Model

Paper • 2405.18407 • Published May 28 • 44

Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Paper • 2406.03344 • Published Jun 5 • 16

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 69

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27 • 78

Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24 • 52

upvoted 4 papers 2 months ago

SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts

Paper • 2405.07518 • Published May 13 • 22

SUTRA: Scalable Multilingual Language Model Architecture

Paper • 2405.06694 • Published May 7 • 36

Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory

Paper • 2405.08707 • Published May 14 • 27

What matters when building vision-language models?

Paper • 2405.02246 • Published May 3 • 93

upvoted 3 papers 3 months ago

On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation

Paper • 2404.08540 • Published Apr 12 • 10

Pre-training Small Base LMs with Fewer Tokens

Paper • 2404.08634 • Published Apr 12 • 33

Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies

Paper • 2404.08197 • Published Apr 12 • 27

upvoted 11 papers 4 months ago

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Paper • 2403.14624 • Published Mar 21 • 50

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Paper • 2404.02905 • Published Apr 3 • 63

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 102

MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens

Paper • 2404.03413 • Published Apr 4 • 22

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

Paper • 2403.18814 • Published Mar 27 • 42

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

Paper • 2403.14520 • Published Mar 21 • 31

Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

Paper • 2403.12596 • Published Mar 19 • 9

Larimar: Large Language Models with Episodic Memory Control

Paper • 2403.11901 • Published Mar 18 • 31

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14 • 123

GiT: Towards Generalist Vision Transformer through Universal Language Interface

Paper • 2403.09394 • Published Mar 14 • 25

MoAI: Mixture of All Intelligence for Large Language and Vision Models

Paper • 2403.07508 • Published Mar 12 • 73

upvoted 3 papers 5 months ago

Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7 • 60

CoLLaVO: Crayon Large Language and Vision mOdel

Paper • 2402.11248 • Published Feb 17 • 18

GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements

Paper • 2402.10963 • Published Feb 13 • 9