VILA^2: VILA Augmented VILA
Paper
•
2407.17453
•
Published
•
38
Perception and abstraction. Each modality is tokenized and embedded into vectors for model to comprehend.
Note General model is not great at specializing tasks. Narrow-domain fine-tuned checkpoint becomes better at specific tasks, such local improvement can feedback onto the full training dataset, achieving self-augmentation based improvement. This is a interesting idea.
Note Use small language model to search the graph and route to the doman expert.