Hugging Face
Models
Datasets
Spaces
Posts
Docs
Solutions
Pricing
Log In
Sign Up
johannhartmann
's Collections
Multimodal Models
Multimodal Models
updated
May 31
A collection of multimodal models for the gpu poor
Upvote
1
google/paligemma-3b-pt-896
Image-Text-to-Text
•
Updated
3 days ago
•
4.24k
•
97
OpenGVLab/InternVL-Chat-V1-5
Image-Text-to-Text
•
Updated
1 day ago
•
27.6k
•
385
alexshengzhili/llava-v1.5-13b-dpo
Text Generation
•
Updated
Apr 13
•
8
•
5
llava-hf/llava-v1.6-mistral-7b-hf
Image-Text-to-Text
•
Updated
2 days ago
•
1.24M
•
185
Qwen/Qwen-VL
Text Generation
•
Updated
Jan 25
•
28.8k
•
181
THUDM/cogvlm2-llama3-chat-19B
Text Generation
•
Updated
May 25
•
126k
•
171
BK-Lee/MoAI-7B
Updated
Mar 12
•
2.37k
•
45
01-ai/Yi-VL-34B
Image-Text-to-Text
•
Updated
26 days ago
•
1.65k
•
249
mPLUG/DocOwl1.5-Omni
Updated
Apr 10
•
1.07k
•
15
google/paligemma-3b-ft-docvqa-896
Image-Text-to-Text
•
Updated
3 days ago
•
1.72k
•
3
Lin-Chen/open-llava-next-llama3-8b
Image-Text-to-Text
•
Updated
May 27
•
3.91k
•
23
Mizukiluke/mplug_owl_2_1
Updated
Jan 31
•
119
•
11
HuanjinYao/DenseConnector-v1.5-8B
Image-to-Text
•
Updated
May 26
•
13
•
7
microsoft/Phi-3-vision-128k-instruct
Text Generation
•
Updated
4 days ago
•
197k
•
816
Upvote
1
Share collection
View history
Collection guide
Browse collections