Co-Attention for Conditioned Image Matching
- URL: http://arxiv.org/abs/2007.08480v2
- Date: Fri, 26 Mar 2021 17:10:13 GMT
- Title: Co-Attention for Conditioned Image Matching
- Authors: Olivia Wiles, Sebastien Ehrhardt, Andrew Zisserman
- Abstract summary: We propose a new approach to determine correspondences between image pairs in the wild under large changes in illumination, viewpoint, context, and material.
While other approaches find correspondences between pairs of images by treating the images independently, we instead condition on both images to implicitly take account of the differences between them.
- Score: 91.43244337264454
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a new approach to determine correspondences between image pairs in
the wild under large changes in illumination, viewpoint, context, and material.
While other approaches find correspondences between pairs of images by treating
the images independently, we instead condition on both images to implicitly
take account of the differences between them. To achieve this, we introduce (i)
a spatial attention mechanism (a co-attention module, CoAM) for conditioning
the learned features on both images, and (ii) a distinctiveness score used to
choose the best matches at test time. CoAM can be added to standard
architectures and trained using self-supervision or supervised data, and
achieves a significant performance improvement under hard conditions, e.g.
large viewpoint changes. We demonstrate that models using CoAM achieve state of
the art or competitive results on a wide range of tasks: local matching, camera
localization, 3D reconstruction, and image stylization.
Related papers
- Robust Scene Change Detection Using Visual Foundation Models and Cross-Attention Mechanisms [27.882122236282054]
We present a novel method for scene change detection that leverages the robust feature extraction capabilities of a visual foundational model, DINOv2.
We evaluate our approach on two benchmark datasets, VL-CMU-CD and PSCD, along with their viewpoint-varied versions.
Our experiments demonstrate significant improvements in F1-score, particularly in scenarios involving geometric changes between image pairs.
arXiv Detail & Related papers (2024-09-25T11:55:27Z) - Cross-Image Attention for Zero-Shot Appearance Transfer [68.43651329067393]
We introduce a cross-image attention mechanism that implicitly establishes semantic correspondences across images.
We harness three mechanisms that either manipulate the noisy latent codes or the model's internal representations throughout the denoising process.
Experiments show that our method is effective across a wide range of object categories and is robust to variations in shape, size, and viewpoint.
arXiv Detail & Related papers (2023-11-06T18:33:24Z) - Scene-Aware Feature Matching [13.014369025829598]
We propose a novel model named SAM, which applies attentional grouping to guide Scene-Aware feature Matching.
With the sense-aware grouping guidance, SAM is not only more accurate and robust but also more interpretable than conventional feature matching models.
arXiv Detail & Related papers (2023-08-19T08:56:35Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Correlational Image Modeling for Self-Supervised Visual Pre-Training [81.82907503764775]
Correlational Image Modeling is a novel and surprisingly effective approach to self-supervised visual pre-training.
Three key designs enable correlational image modeling as a nontrivial and meaningful self-supervisory task.
arXiv Detail & Related papers (2023-03-22T15:48:23Z) - Neural Congealing: Aligning Images to a Joint Semantic Atlas [14.348512536556413]
We present a zero-shot self-supervised framework for aligning semantically-common content across a set of images.
Our approach harnesses the power of pre-trained DINO-ViT features to learn.
We show that our method performs favorably compared to a state-of-the-art method that requires extensive training on large-scale datasets.
arXiv Detail & Related papers (2023-02-08T09:26:22Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z) - Learning to Compose Hypercolumns for Visual Correspondence [57.93635236871264]
We introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match.
The proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network.
arXiv Detail & Related papers (2020-07-21T04:03:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.