Fine-tuning for multiple tasks strategy
#32
by
gennarino80
- opened
I would like to fine-tune this model on a specific set of images and combining 2 different tasks (used in cascade).
The idea is that once received the input image, the model should perform the image captioning task (MORE_DETAILED_CAPTION) to describe the image, and then use the CAPTION_TO_PHRASE_GROUNDING in order to have a 'visual perspective' of what the model has described (a sort of gradcam of the text).
What should I do in this case? Fine tune the model twice, starting from the image captioning task and then use the obtained model to train the model for the second task?