Finetuning owlv2

#6
by kevinjeswani - opened

Hi

I've tried having a go at finetuning this with whatever available docs there for finetuning vision transformers (https://huggingface.co/learn/computer-vision-course/en/unit3/vision-transformers/vision-transformer-for-objection-detection), but I'm completely lost.

Any suggestions on how to go about this or where I can find this information? How should the input dataset be structured?

Thanks!

Sign up or log in to comment