Patch2Pix: Epipolar-Guided Pixel-Level Correspondences
- URL: http://arxiv.org/abs/2012.01909v3
- Date: Fri, 26 Mar 2021 19:54:41 GMT
- Title: Patch2Pix: Epipolar-Guided Pixel-Level Correspondences
- Authors: Qunjie Zhou, Torsten Sattler, Laura Leal-Taixe
- Abstract summary: We present Patch2Pix, a novel refinement network that refines match proposals by regressing pixel-level matches from the local regions defined by those proposals.
We show that our refinement network significantly improves the performance of correspondence networks on image matching, homography estimation, and localization tasks.
- Score: 38.38520763114715
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The classical matching pipeline used for visual localization typically
involves three steps: (i) local feature detection and description, (ii) feature
matching, and (iii) outlier rejection. Recently emerged correspondence networks
propose to perform those steps inside a single network but suffer from low
matching resolution due to the memory bottleneck. In this work, we propose a
new perspective to estimate correspondences in a detect-to-refine manner, where
we first predict patch-level match proposals and then refine them. We present
Patch2Pix, a novel refinement network that refines match proposals by
regressing pixel-level matches from the local regions defined by those
proposals and jointly rejecting outlier matches with confidence scores.
Patch2Pix is weakly supervised to learn correspondences that are consistent
with the epipolar geometry of an input image pair. We show that our refinement
network significantly improves the performance of correspondence networks on
image matching, homography estimation, and localization tasks. In addition, we
show that our learned refinement generalizes to fully-supervised methods
without re-training, which leads us to state-of-the-art localization
performance. The code is available at https://github.com/GrumpyZhou/patch2pix.
Related papers
- Improving Transformer-based Image Matching by Cascaded Capturing
Spatially Informative Keypoints [44.90917854990362]
We propose a transformer-based cascade matching model -- Cascade feature Matching TRansformer (CasMTR)
We use a simple yet effective Non-Maximum Suppression (NMS) post-process to filter keypoints through the confidence map.
CasMTR achieves state-of-the-art performance in indoor and outdoor pose estimation as well as visual localization.
arXiv Detail & Related papers (2023-03-06T04:32:34Z) - Feature-based Image Matching for Identifying Individual K\=ak\=a [0.0]
This report investigates an unsupervised, feature-based image matching pipeline for the novel application of identifying individual k=ak=a.
Applying with a similarity network for clustering, this addresses a weakness of current supervised approaches to identifying individual birds.
We conclude that feature-based image matching could be used with a similarity network to provide a viable alternative to existing supervised approaches.
arXiv Detail & Related papers (2023-01-17T03:43:19Z) - DDM-NET: End-to-end learning of keypoint feature Detection, Description
and Matching for 3D localization [34.66510265193038]
We propose an end-to-end framework that jointly learns keypoint detection, descriptor representation and cross-frame matching.
We design a self-supervised image warping correspondence loss for both feature detection and matching.
We also propose a new loss to robustly handle both definite inlier/outlier matches and less-certain matches.
arXiv Detail & Related papers (2022-12-08T21:43:56Z) - Guide Local Feature Matching by Overlap Estimation [9.387323456222823]
We introduce a novel Overlap Estimation method conditioned on image pairs with TRansformer, named OETR.
OETR performs overlap estimation in a two-step process of feature correlation and then overlap regression.
Experiments show that OETR can boost state-of-the-art local feature matching performance substantially.
arXiv Detail & Related papers (2022-02-18T07:11:36Z) - DFM: A Performance Baseline for Deep Feature Matching [10.014010310188821]
The proposed method uses pre-trained VGG architecture as a feature extractor and does not require any additional training specific to improve matching.
Our algorithm achieves 0.57 and 0.80 overall scores in terms of Mean Matching Accuracy (MMA) for 1 pixel and 2 pixels thresholds respectively on Hpatches dataset.
arXiv Detail & Related papers (2021-06-14T22:55:06Z) - Unsupervised Metric Relocalization Using Transform Consistency Loss [66.19479868638925]
Training networks to perform metric relocalization traditionally requires accurate image correspondences.
We propose a self-supervised solution, which exploits a key insight: localizing a query image within a map should yield the same absolute pose, regardless of the reference image used for registration.
We evaluate our framework on synthetic and real-world data, showing our approach outperforms other supervised methods when a limited amount of ground-truth information is available.
arXiv Detail & Related papers (2020-11-01T19:24:27Z) - Deep Shells: Unsupervised Shape Correspondence with Optimal Transport [52.646396621449]
We propose a novel unsupervised learning approach to 3D shape correspondence.
We show that the proposed method significantly improves over the state-of-the-art on multiple datasets.
arXiv Detail & Related papers (2020-10-28T22:24:07Z) - Graph-PCNN: Two Stage Human Pose Estimation with Graph Pose Refinement [54.29252286561449]
We propose a two-stage graph-based and model-agnostic framework, called Graph-PCNN.
In the first stage, heatmap regression network is applied to obtain a rough localization result, and a set of proposal keypoints, called guided points, are sampled.
In the second stage, for each guided point, different visual feature is extracted by the localization.
The relationship between guided points is explored by the graph pose refinement module to get more accurate localization results.
arXiv Detail & Related papers (2020-07-21T04:59:15Z) - Making Affine Correspondences Work in Camera Geometry Computation [62.7633180470428]
Local features provide region-to-region rather than point-to-point correspondences.
We propose guidelines for effective use of region-to-region matches in the course of a full model estimation pipeline.
Experiments show that affine solvers can achieve accuracy comparable to point-based solvers at faster run-times.
arXiv Detail & Related papers (2020-07-20T12:07:48Z) - Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the
Wild [104.61677518999976]
We propose Pixel-in-Pixel Net (PIPNet) for facial landmark detection.
The proposed model is equipped with a novel detection head based on heatmap regression.
To further improve the cross-domain generalization capability of PIPNet, we propose self-training with curriculum.
arXiv Detail & Related papers (2020-03-08T12:23:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.