Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality Paper • 2410.05210 • Published 9 days ago • 9 • 3