COTR: Correspondence Transformer for Matching Across Images
- URL: http://arxiv.org/abs/2103.14167v1
- Date: Thu, 25 Mar 2021 22:47:02 GMT
- Title: COTR: Correspondence Transformer for Matching Across Images
- Authors: Wei Jiang, Eduard Trulls, Jan Hosang, Andrea Tagliasacchi, Kwang Moo
Yi
- Abstract summary: We propose a novel framework for finding correspondences in images based on a deep neural network.
By doing so, one has the option to query only the points of interest and retrieve sparse correspondences, or to query all points in an image and obtain dense mappings.
- Score: 31.995943755283786
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel framework for finding correspondences in images based on a
deep neural network that, given two images and a query point in one of them,
finds its correspondence in the other. By doing so, one has the option to query
only the points of interest and retrieve sparse correspondences, or to query
all points in an image and obtain dense mappings. Importantly, in order to
capture both local and global priors, and to let our model relate between image
regions using the most relevant among said priors, we realize our network using
a transformer. At inference time, we apply our correspondence network by
recursively zooming in around the estimates, yielding a multiscale pipeline
able to provide highly-accurate correspondences. Our method significantly
outperforms the state of the art on both sparse and dense correspondence
problems on multiple datasets and tasks, ranging from wide-baseline stereo to
optical flow, without any retraining for a specific dataset. We commit to
releasing data, code, and all the tools necessary to train from scratch and
ensure reproducibility.
Related papers
- Scrape, Cut, Paste and Learn: Automated Dataset Generation Applied to
Parcel Logistics [58.720142291102135]
We present a fully automated pipeline to generate a synthetic dataset for instance segmentation in four steps.
We first scrape images for the objects of interest from popular image search engines.
We compare three different methods for image selection: Object-agnostic pre-processing, manual image selection and CNN-based image selection.
arXiv Detail & Related papers (2022-10-18T12:49:04Z) - Correlation Verification for Image Retrieval [15.823918683848877]
We propose a novel image retrieval re-ranking network named Correlation Verification Networks (CVNet)
CVNet compresses dense feature correlation into image similarity while learning diverse geometric matching patterns from various image pairs.
Our proposed network shows state-of-the-art performance on several retrieval benchmarks with a significant margin.
arXiv Detail & Related papers (2022-04-04T13:18:49Z) - DenseGAP: Graph-Structured Dense Correspondence Learning with Anchor
Points [15.953570826460869]
Establishing dense correspondence between two images is a fundamental computer vision problem.
We introduce DenseGAP, a new solution for efficient Dense correspondence learning with a Graph-structured neural network conditioned on Anchor Points.
Our method advances the state-of-the-art of correspondence learning on most benchmarks.
arXiv Detail & Related papers (2021-12-13T18:59:30Z) - Learning Co-segmentation by Segment Swapping for Retrieval and Discovery [67.6609943904996]
The goal of this work is to efficiently identify visually similar patterns from a pair of images.
We generate synthetic training pairs by selecting object segments in an image and copy-pasting them into another image.
We show our approach provides clear improvements for artwork details retrieval on the Brueghel dataset.
arXiv Detail & Related papers (2021-10-29T16:51:16Z) - Few-shot Segmentation with Optimal Transport Matching and Message Flow [50.9853556696858]
It is essential for few-shot semantic segmentation to fully utilize the support information.
We propose a Correspondence Matching Network (CMNet) with an Optimal Transport Matching module.
Experiments on PASCAL VOC 2012, MS COCO, and FSS-1000 datasets show that our network achieves new state-of-the-art few-shot segmentation performance.
arXiv Detail & Related papers (2021-08-19T06:26:11Z) - LocalTrans: A Multiscale Local Transformer Network for Cross-Resolution
Homography Estimation [52.63874513999119]
Cross-resolution image alignment is a key problem in multiscale giga photography.
Existing deep homography methods neglecting the explicit formulation of correspondences between them, which leads to degraded accuracy in cross-resolution challenges.
We propose a local transformer network embedded within a multiscale structure to explicitly learn correspondences between the multimodal inputs.
arXiv Detail & Related papers (2021-06-08T02:51:45Z) - Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with
Transformers [115.90778814368703]
Our objective is language-based search of large-scale image and video datasets.
For this task, the approach that consists of independently mapping text and vision to a joint embedding space, a.k.a. dual encoders, is attractive as retrieval scales.
An alternative approach of using vision-text transformers with cross-attention gives considerable improvements in accuracy over the joint embeddings.
arXiv Detail & Related papers (2021-03-30T17:57:08Z) - Unsupervised Metric Relocalization Using Transform Consistency Loss [66.19479868638925]
Training networks to perform metric relocalization traditionally requires accurate image correspondences.
We propose a self-supervised solution, which exploits a key insight: localizing a query image within a map should yield the same absolute pose, regardless of the reference image used for registration.
We evaluate our framework on synthetic and real-world data, showing our approach outperforms other supervised methods when a limited amount of ground-truth information is available.
arXiv Detail & Related papers (2020-11-01T19:24:27Z) - Image Retrieval for Structure-from-Motion via Graph Convolutional
Network [13.040952255039702]
We present a novel retrieval method based on Graph Convolutional Network (GCN) to generate accurate pairwise matches without costly redundancy.
By constructing a subgraph surrounding the query image as input data, we adopt a learnable GCN to exploit whether nodes in the subgraph have overlapping regions with the query photograph.
Experiments demonstrate that our method performs remarkably well on the challenging dataset of highly ambiguous and duplicated scenes.
arXiv Detail & Related papers (2020-09-17T04:03:51Z) - Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using
Transformer Encoders [14.634046503477979]
We present a novel approach called Transformer Reasoning and Alignment Network (TERAN)
TERAN enforces a fine-grained match between the underlying components of images and sentences.
On the MS-COCO 1K test set, we obtain an improvement of 5.7% and 3.5% respectively on the image and the sentence retrieval tasks.
arXiv Detail & Related papers (2020-08-12T11:02:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.