--- title: OmniGlue - Feature Matching emoji: 🦀 colorFrom: yellow colorTo: red sdk: gradio sdk_version: 4.31.5 app_file: app.py pinned: false short_description: Feature Matching with Foundation Model Guidance --- # \[CVPR'24\] Code release for OmniGlue [Hanwen Jiang](https://hwjiang1510.github.io/), [Arjun Karpur](https://scholar.google.com/citations?user=jgSItF4AAAAJ), [Bingyi Cao](https://scholar.google.com/citations?user=7EeSOcgAAAAJ), [Qixing Huang](https://www.cs.utexas.edu/~huangqx/), [Andre Araujo](https://andrefaraujo.github.io/) -------------------------------------------------------------------------------- [**Project Page**](https://hwjiang1510.github.io/OmniGlue/) | [**Paper**](https://arxiv.org/abs/2405.12979) | [**Usage**](#installation) Official code release for the CVPR 2024 paper: **OmniGlue: Generalizable Feature Matching with Foundation Model Guidance**. ![og_diagram.png](res/og_diagram.png "og_diagram.png") **Abstract:** The image matching field has been witnessing a continuous emergence of novel learnable feature matching techniques, with ever-improving performance on conventional benchmarks. However, our investigation shows that despite these gains, their potential for real-world applications is restricted by their limited generalization capabilities to novel image domains. In this paper, we introduce OmniGlue, the first learnable image matcher that is designed with generalization as a core principle. OmniGlue leverages broad knowledge from a vision foundation model to guide the feature matching process, boosting generalization to domains not seen at training time. Additionally, we propose a novel keypoint position-guided attention mechanism which disentangles spatial and appearance information, leading to enhanced matching descriptors. We perform comprehensive experiments on a suite of 6 datasets with varied image domains, including scene-level, object-centric and aerial images. OmniGlue’s novel components lead to relative gains on unseen domains of 18.8% with respect to a directly comparable reference model, while also outperforming the recent LightGlue method by 10.1% relatively. ## Installation First, use pip to install `omniglue`: ```sh conda create -n omniglue pip conda activate omniglue git clone https://github.com/google-research/omniglue.git cd omniglue pip install -e . ``` Then, download the following models to `./models/` ```sh # Download to ./models/ dir. mkdir models cd models # SuperPoint. git clone https://github.com/rpautrat/SuperPoint.git mv SuperPoint/pretrained_models/sp_v6.tgz . && rm -rf SuperPoint tar zxvf sp_v6.tgz && rm sp_v6.tgz # DINOv2 - vit-b14. wget https://dl.fbaipublicfiles.com/dinov2/dinov2_vitb14/dinov2_vitb14_pretrain.pth # OmniGlue. wget https://storage.googleapis.com/omniglue/og_export.zip unzip og_export.zip && rm og_export.zip ``` Direct download links: - [[SuperPoint weights]](https://github.com/rpautrat/SuperPoint/tree/master/pretrained_models): from [github.com/rpautrat/SuperPoint](https://github.com/rpautrat/SuperPoint) - [[DINOv2 weights]](https://dl.fbaipublicfiles.com/dinov2/dinov2_vitb14/dinov2_vitb14_pretrain.pth): from [github.com/facebookresearch/dinov2](https://github.com/facebookresearch/dinov2) (ViT-B/14 distilled backbone without register). - [[OmniGlue weights]](https://storage.googleapis.com/omniglue/og_export.zip) ## Usage The code snippet below outlines how you can perform OmniGlue inference in your own python codebase. ```py import omniglue image0 = ... # load images from file into np.array image1 = ... og = omniglue.OmniGlue( og_export='./models/og_export', sp_export='./models/sp_v6', dino_export='./models/dinov2_vitb14_pretrain.pth', ) match_kp0s, match_kp1s, match_confidences = og.FindMatches(image0, image1) # Output: # match_kp0: (N, 2) array of (x,y) coordinates in image0. # match_kp1: (N, 2) array of (x,y) coordinates in image1. # match_confidences: N-dim array of each of the N match confidence scores. ``` ## Demo `demo.py` contains example usage of the `omniglue` module. To try with your own images, replace `./res/demo1.jpg` and `./res/demo2.jpg` with your own filepaths. ```sh conda activate omniglue python demo.py ./res/demo1.jpg ./res/demo2.jpg # ``` Expected output: ![demo_output.png](res/demo_output.png "demo_output.png") ## Repo TODOs - ~~Provide `demo.py` example usage script.~~ - Support matching for pre-extracted features. - Release eval pipelines for in-domain (MegaDepth). - Release eval pipelines for all out-of-domain datasets. ## BibTex ``` @inproceedings{jiang2024Omniglue, title={OmniGlue: Generalizable Feature Matching with Foundation Model Guidance}, author={Jiang, Hanwen and Karpur, Arjun and Cao, Bingyi and Huang, Qixing and Araujo, Andre}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2024}, } ``` -------------------------------------------------------------------------------- This is not an officially supported Google product. Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference