iMatching: Imperative Correspondence Learning
- URL: http://arxiv.org/abs/2312.02141v1
- Date: Mon, 4 Dec 2023 18:58:20 GMT
- Title: iMatching: Imperative Correspondence Learning
- Authors: Zitong Zhan, Dasong Gao, Yun-Jou Lin, Youjie Xia, Chen Wang
- Abstract summary: We introduce a new self-supervised scheme, imperative learning (IL), for training feature correspondence.
It enables correspondence learning on arbitrary uninterrupted videos without any camera pose or depth labels.
We demonstrate superior performance on tasks including feature matching and pose estimation.
- Score: 5.974164730742711
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning feature correspondence is a foundational task in computer vision,
holding immense importance for downstream applications such as visual odometry
and 3D reconstruction. Despite recent progress in data-driven models, feature
correspondence learning is still limited by the lack of accurate per-pixel
correspondence labels. To overcome this difficulty, we introduce a new
self-supervised scheme, imperative learning (IL), for training feature
correspondence. It enables correspondence learning on arbitrary uninterrupted
videos without any camera pose or depth labels, heralding a new era for
self-supervised correspondence learning. Specifically, we formulated the
problem of correspondence learning as a bilevel optimization, which takes the
reprojection error from bundle adjustment as a supervisory signal for the
model. To avoid large memory and computation overhead, we leverage the
stationary point to effectively back-propagate the implicit gradients through
bundle adjustment. Through extensive experiments, we demonstrate superior
performance on tasks including feature matching and pose estimation, in which
we obtained an average of 30% accuracy gain over the state-of-the-art matching
models.
Related papers
- Gradient Boosting Mapping for Dimensionality Reduction and Feature Extraction [2.778647101651566]
A fundamental problem in supervised learning is to find a good set of features or distance measures.
We propose a supervised dimensionality reduction method, where the outputs of weak learners define the embedding.
We show that the embedding coordinates provide better features for the supervised learning task.
arXiv Detail & Related papers (2024-05-14T10:23:57Z) - Learning Cross-view Visual Geo-localization without Ground Truth [48.51859322439286]
Cross-View Geo-Localization (CVGL) involves determining the geographical location of a query image by matching it with a corresponding GPS-tagged reference image.
Current state-of-the-art methods rely on training models with labeled paired images, incurring substantial annotation costs and training burdens.
We investigate the adaptation of frozen models for CVGL without requiring ground truth pair labels.
arXiv Detail & Related papers (2024-03-19T13:01:57Z) - Improving Semantic Correspondence with Viewpoint-Guided Spherical Maps [39.00415825387414]
We propose a new approach for semantic correspondence estimation that supplements discriminative features with 3D understanding via a weak geometric spherical prior.
Compared to more involved 3D pipelines, our model only requires weak viewpoint information, and the simplicity of our spherical representation enables us to inject informative geometric priors into the model during training.
We present results on the challenging SPair-71k dataset, where our approach demonstrates is capable of distinguishing between symmetric views and repeated parts across many object categories.
arXiv Detail & Related papers (2023-12-20T17:35:24Z) - Match me if you can: Semantic Correspondence Learning with Unpaired
Images [82.05105090432025]
We propose a simple yet effective method that performs training with unlabeled pairs to complement both limited image pairs and sparse point pairs.
Using a simple teacher-student framework, we offer reliable pseudo correspondences to the student network via machine supervision.
Our models outperform the milestone baselines, including state-of-the-art methods on semantic correspondence benchmarks.
arXiv Detail & Related papers (2023-11-30T13:22:15Z) - Q-REG: End-to-End Trainable Point Cloud Registration with Surface
Curvature [81.25511385257344]
We present a novel solution, Q-REG, which utilizes rich geometric information to estimate the rigid pose from a single correspondence.
Q-REG allows to formalize the robust estimation as an exhaustive search, hence enabling end-to-end training.
We demonstrate in the experiments that Q-REG is agnostic to the correspondence matching method and provides consistent improvement both when used only in inference and in end-to-end training.
arXiv Detail & Related papers (2023-09-27T20:58:53Z) - S$^2$Contact: Graph-based Network for 3D Hand-Object Contact Estimation
with Semi-Supervised Learning [70.72037296392642]
We propose a novel semi-supervised framework that allows us to learn contact from monocular images.
Specifically, we leverage visual and geometric consistency constraints in large-scale datasets for generating pseudo-labels.
We show benefits from using a contact map that rules hand-object interactions to produce more accurate reconstructions.
arXiv Detail & Related papers (2022-08-01T14:05:23Z) - Self-Supervised 3D Hand Pose Estimation from monocular RGB via
Contrastive Learning [50.007445752513625]
We propose a new self-supervised method for the structured regression task of 3D hand pose estimation.
We experimentally investigate the impact of invariant and equivariant contrastive objectives.
We show that a standard ResNet-152, trained on additional unlabeled data, attains an improvement of $7.6%$ in PA-EPE on FreiHAND.
arXiv Detail & Related papers (2021-06-10T17:48:57Z) - Warp Consistency for Unsupervised Learning of Dense Correspondences [116.56251250853488]
Key challenge in learning dense correspondences is lack of ground-truth matches for real image pairs.
We propose Warp Consistency, an unsupervised learning objective for dense correspondence regression.
Our approach sets a new state-of-the-art on several challenging benchmarks, including MegaDepth, RobotCar and TSS.
arXiv Detail & Related papers (2021-04-07T17:58:22Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.