dashfunnydashdash (J)

upvoted a paper about 9 hours ago

Scaling Retrieval-Based Language Models with a Trillion-Token Datastore

Paper • 2407.12854 • Published 13 days ago • 25

upvoted a paper 2 days ago

Shape of Motion: 4D Reconstruction from a Single Video

Paper • 2407.13764 • Published 3 days ago • 14

upvoted 2 papers 4 days ago

Click-Gaussian: Interactive Segmentation to Any 3D Gaussians

Paper • 2407.11793 • Published 5 days ago • 3

Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes

Paper • 2407.10957 • Published 6 days ago • 23

upvoted a paper 5 days ago

GRUtopia: Dream General Robots in a City at Scale

Paper • 2407.10943 • Published 6 days ago • 20

upvoted a paper 8 days ago

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients

Paper • 2407.08296 • Published 11 days ago • 28

upvoted a paper 12 days ago

ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation

Paper • 2407.06135 • Published 13 days ago • 19

upvoted 3 papers 13 days ago

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

Paper • 2407.04051 • Published 17 days ago • 33

Unveiling Encoder-Free Vision-Language Models

Paper • 2406.11832 • Published Jun 17 • 45

Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams

Paper • 2406.08085 • Published Jun 12 • 11

upvoted 3 papers 18 days ago

Revealing Fine-Grained Values and Opinions in Large Language Models

Paper • 2406.19238 • Published 24 days ago • 13

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Paper • 2407.02490 • Published 19 days ago • 23

OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation

Paper • 2407.02371 • Published 19 days ago • 47

upvoted a paper 19 days ago

LiteSearch: Efficacious Tree Search for LLM

Paper • 2407.00320 • Published 23 days ago • 37

upvoted a paper 20 days ago

GaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enhanced Quality

Paper • 2406.18462 • Published 25 days ago • 11

upvoted a paper 23 days ago

OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published 24 days ago • 51

upvoted 2 papers 24 days ago

CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs

Paper • 2406.18521 • Published 25 days ago • 25

A Closer Look into Mixture-of-Experts in Large Language Models

Paper • 2406.18219 • Published 26 days ago • 14

upvoted a paper 25 days ago

Adam-mini: Use Fewer Learning Rates To Gain More

Paper • 2406.16793 • Published 27 days ago • 65

upvoted a paper 26 days ago

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Paper • 2406.16860 • Published 27 days ago • 52

upvoted 14 papers about 1 month ago

nabla^2DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials

Paper • 2406.14347 • Published Jun 20 • 99

The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing

Paper • 2406.10601 • Published Jun 15 • 65

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

Paper • 2406.12034 • Published Jun 17 • 12

Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

Paper • 2406.09416 • Published Jun 13 • 28

3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

Paper • 2406.05132 • Published Jun 7 • 27

What If We Recaption Billions of Web Images with LLaMA-3?

Paper • 2406.08478 • Published Jun 12 • 38

Simplified and Generalized Masked Diffusion for Discrete Data

Paper • 2406.04329 • Published Jun 6 • 4

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Paper • 2406.07476 • Published Jun 11 • 30

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 69

upvoted 2 papers about 2 months ago

Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration

Paper • 2406.01014 • Published Jun 3 • 29

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

Paper • 2406.02430 • Published Jun 4 • 27

upvoted an article about 2 months ago

Article

Uncensor any LLM with abliteration

By

•

Jun 13

• 237

upvoted 3 papers about 2 months ago

μLO: Compute-Efficient Meta-Generalization of Learned Optimizers

Paper • 2406.00153 • Published May 31 • 9

ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation

Paper • 2406.00908 • Published Jun 3 • 11

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Paper • 2406.01574 • Published Jun 3 • 42

upvoted an article about 2 months ago

Article

Introduction to State Space Models (SSM)

By

•

2 days ago

• 59

upvoted 16 papers about 2 months ago

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Paper • 2405.21060 • Published May 31 • 61

MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Paper • 2405.20340 • Published May 30 • 19

Jina CLIP: Your CLIP Model Is Also Your Text Retriever

Paper • 2405.20204 • Published May 30 • 28

Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts

Paper • 2405.19893 • Published May 30 • 26

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

Paper • 2405.19327 • Published May 29 • 43

Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels

Paper • 2405.16822 • Published May 27 • 11

Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27 • 29

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27 • 78

Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer

Paper • 2405.17405 • Published May 27 • 14

ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models

Paper • 2405.15738 • Published May 24 • 43

Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

Paper • 2405.15071 • Published May 23 • 33

Aya 23: Open Weight Releases to Further Multilingual Progress

Paper • 2405.15032 • Published May 23 • 21

Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24 • 52

CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner

Paper • 2405.14979 • Published May 23 • 14

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24 • 11

ReVideo: Remake a Video with Motion and Content Control

Paper • 2405.13865 • Published May 22 • 22

upvoted 3 papers 2 months ago

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Paper • 2405.11143 • Published May 20 • 33

FIFO-Diffusion: Generating Infinite Videos from Text without Training

Paper • 2405.11473 • Published May 19 • 53

Toon3D: Seeing Cartoons from a New Perspective

Paper • 2405.10320 • Published May 16 • 19

J

AI & ML interests

Organizations

dashfunnydashdash's activity

Uncensor any LLM with abliteration

Introduction to State Space Models (SSM)