SCENES: Subpixel Correspondence Estimation With Epipolar Supervision
- URL: http://arxiv.org/abs/2401.10886v1
- Date: Fri, 19 Jan 2024 18:57:46 GMT
- Title: SCENES: Subpixel Correspondence Estimation With Epipolar Supervision
- Authors: Dominik A. Kloepfer, Jo\~ao F. Henriques, Dylan Campbell
- Abstract summary: Extracting point correspondences from two or more views of a scene is a fundamental computer vision problem.
Existing local feature matching approaches, trained with correspondence supervision on large-scale datasets, obtain highly-accurate matches on the test sets.
We relax this assumption by removing the requirement of 3D structure, e.g., depth maps or point clouds, and only require camera pose information, which can be obtained from odometry.
- Score: 18.648772607057175
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Extracting point correspondences from two or more views of a scene is a
fundamental computer vision problem with particular importance for relative
camera pose estimation and structure-from-motion. Existing local feature
matching approaches, trained with correspondence supervision on large-scale
datasets, obtain highly-accurate matches on the test sets. However, they do not
generalise well to new datasets with different characteristics to those they
were trained on, unlike classic feature extractors. Instead, they require
finetuning, which assumes that ground-truth correspondences or ground-truth
camera poses and 3D structure are available. We relax this assumption by
removing the requirement of 3D structure, e.g., depth maps or point clouds, and
only require camera pose information, which can be obtained from odometry. We
do so by replacing correspondence losses with epipolar losses, which encourage
putative matches to lie on the associated epipolar line. While weaker than
correspondence supervision, we observe that this cue is sufficient for
finetuning existing models on new data. We then further relax the assumption of
known camera poses by using pose estimates in a novel bootstrapping approach.
We evaluate on highly challenging datasets, including an indoor drone dataset
and an outdoor smartphone camera dataset, and obtain state-of-the-art results
without strong supervision.
Related papers
- FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - ContraNeRF: 3D-Aware Generative Model via Contrastive Learning with
Unsupervised Implicit Pose Embedding [40.36882490080341]
We propose a novel 3D-aware GAN optimization technique through contrastive learning with implicit pose embeddings.
We make the discriminator estimate a high-dimensional implicit pose embedding from a given image and perform contrastive learning on the pose embedding.
The proposed approach can be employed for the dataset, where the canonical camera pose is ill-defined because it does not look up or estimate camera poses.
arXiv Detail & Related papers (2023-04-27T07:53:13Z) - Long-term Visual Localization with Mobile Sensors [30.839849072256435]
We propose to leverage additional sensors on a mobile phone, mainly GPS, compass, and gravity sensor, to solve this challenging problem.
With the initial pose, we are also able to devise a direct 2D-3D matching network to efficiently establish 2D-3D correspondences.
We benchmark our method as well as several state-of-the-art baselines and demonstrate the effectiveness of the proposed approach.
arXiv Detail & Related papers (2023-04-16T04:35:10Z) - Few-View Object Reconstruction with Unknown Categories and Camera Poses [80.0820650171476]
This work explores reconstructing general real-world objects from a few images without known camera poses or object categories.
The crux of our work is solving two fundamental 3D vision problems -- shape reconstruction and pose estimation.
Our method FORGE predicts 3D features from each view and leverages them in conjunction with the input images to establish cross-view correspondence.
arXiv Detail & Related papers (2022-12-08T18:59:02Z) - MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision [72.5863451123577]
We show how to train a neural model that can perform accurate 3D pose and camera estimation.
Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines.
arXiv Detail & Related papers (2021-08-10T18:39:56Z) - Lidar-Monocular Surface Reconstruction Using Line Segments [5.542669744873386]
We propose to leverage common geometric features that are detected in both the LIDAR scans and image data, allowing data from the two sensors to be processed in a higher-level space.
We show that our method delivers results that are comparable to a state-of-the-art LIDAR survey while not requiring highly accurate ground truth pose estimates.
arXiv Detail & Related papers (2021-04-06T19:49:53Z) - Deep Bingham Networks: Dealing with Uncertainty and Ambiguity in Pose
Estimation [74.76155168705975]
Deep Bingham Networks (DBN) can handle pose-related uncertainties and ambiguities arising in almost all real life applications concerning 3D data.
DBN extends the state of the art direct pose regression networks by (i) a multi-hypotheses prediction head which can yield different distribution modes.
We propose new training strategies so as to avoid mode or posterior collapse during training and to improve numerical stability.
arXiv Detail & Related papers (2020-12-20T19:20:26Z) - Kinematic-Structure-Preserved Representation for Unsupervised 3D Human
Pose Estimation [58.72192168935338]
Generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable.
We propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions.
Our proposed model employs three consecutive differentiable transformations named as forward-kinematics, camera-projection and spatial-map transformation.
arXiv Detail & Related papers (2020-06-24T23:56:33Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z) - Predicting Camera Viewpoint Improves Cross-dataset Generalization for 3D
Human Pose Estimation [32.6329300863371]
We study the diversity and biases present in specific datasets and its effect on cross-dataset generalization.
We find that models trained to jointly predict viewpoint and pose systematically show significantly improved cross-dataset generalization.
arXiv Detail & Related papers (2020-04-07T06:06:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.