Molbap (Pablo Montalvo)

upvoted 3 articles 2 months ago

Article

Introducing TextImage Augmentation for Document Images

Aug 6

• 29

Article

MobileNet Baselines

By

•

Jul 26

• 23

Article

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?

Jul 25

• 18

upvoted an article 3 months ago

Article

Mixture of Experts Explained

Dec 11, 2023

• 162

upvoted a paper 3 months ago

PaliGemma: A versatile 3B VLM for transfer

Paper • 2407.07726 • Published Jul 10 • 65

upvoted 2 collections 4 months ago

Searching for Better ViT Baselines

Collection

Exploring ViT hparams and model shapes for the GPU poor (between tiny and base). • 25 items • Updated Aug 21 • 12

MobileNetV4 pretrained weights

Collection

Weights for MobileNet-V4 pretrained in timm • 17 items • Updated 14 days ago • 13

upvoted 4 papers 4 months ago

MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens

Paper • 2406.11271 • Published Jun 17 • 18

upvoted 4 articles 5 months ago

Article

AI has a problem with objectifying women

By

•

May 24

• 54

Article

MobileNet-V4 (now in timm)

By

•

Jun 17

• 37

Article

Multimodal Augmentation for Documents: Recovering “Comprehension” in “Reading and Comprehension” task

By

•

May 16

• 17

Article

License to Call: Introducing Transformers Agents 2.0

May 13

• 108

upvoted a collection 5 months ago

PaliGemma Release

Collection

Pretrained and mix checkpoints for PaliGemma • 16 items • Updated Jul 31 • 136

upvoted 2 articles 5 months ago

Article

2024-04-22 - Hub Incident Post Mortem

By

•

May 17

• 17

Article

PaliGemma – Google's Cutting-Edge Open Vision Language Model

May 14

• 201

upvoted a paper 6 months ago

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

Paper • 2404.06512 • Published Apr 9 • 29

upvoted a paper 7 months ago

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 592

upvoted 4 papers 9 months ago

Small Language Model Meets with Reinforced Vision Vocabulary

Paper • 2401.12503 • Published Jan 23 • 31

SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities

Paper • 2401.12168 • Published Jan 22 • 24

CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark

Paper • 2401.11944 • Published Jan 22 • 24

Scalable Pre-training of Large Autoregressive Image Models

Paper • 2401.08541 • Published Jan 16 • 35

upvoted a collection 9 months ago

SigLIP

Collection

Contrastive (sigmoid) image-text models from https://arxiv.org/abs/2303.15343 • 8 items • Updated Jul 31 • 34

upvoted a paper 10 months ago

Distributed Inference and Fine-tuning of Large Language Models Over The Internet

Paper • 2312.08361 • Published Dec 13, 2023 • 25

upvoted a paper 11 months ago

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 182

upvoted a paper about 1 year ago

Retentive Network: A Successor to Transformer for Large Language Models

Paper • 2307.08621 • Published Jul 17, 2023 • 170

Pablo Montalvo PRO

AI & ML interests

Articles

Introducing TextImage Augmentation for Document Images

Organizations

Molbap's activity

Introducing TextImage Augmentation for Document Images

MobileNet Baselines

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?

Mixture of Experts Explained

PaliGemma: A versatile 3B VLM for transfer

Searching for Better ViT Baselines

MobileNetV4 pretrained weights

MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens

What If We Recaption Billions of Web Images with LLaMA-3?

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models

AI has a problem with objectifying women

MobileNet-V4 (now in timm)

Multimodal Augmentation for Documents: Recovering “Comprehension” in “Reading and Comprehension” task

License to Call: Introducing Transformers Agents 2.0

PaliGemma Release

2024-04-22 - Hub Incident Post Mortem

PaliGemma – Google's Cutting-Edge Open Vision Language Model

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Small Language Model Meets with Reinforced Vision Vocabulary

SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities

CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark

Scalable Pre-training of Large Autoregressive Image Models

SigLIP

Distributed Inference and Fine-tuning of Large Language Models Over The Internet

GAIA: a benchmark for General AI Assistants

Retentive Network: A Successor to Transformer for Large Language Models