How to trains this vision part by yourself?

#9
by CCRss - opened

Hello, I'm interested in training this ViT part using our own dataset or at least fine tune it. Goal is to combine later on this part with language part and make MLLM but it's so hard, can you please give me some suggestion how to do it, pleasee

OpenGVLab org

Hi, you can refer to our InternVL2 series models. We have already combined this vision encoder with an LLM to construct an MLLM, which you can fine-tune directly using your data. You are free to decide whether to fine-tune the vision encoder, MLP projector, or LLM, based on your needs.

For details, please see here: https://internvl.readthedocs.io/en/latest/internvl2.0/finetune.html.

czczup changed discussion status to closed

Sign up or log in to comment