RELMOBNET: A Robust Two-Stage End-To-End Training Approach For
MOBILENETV3 Based Relative Camera Pose Estimation
- URL: http://arxiv.org/abs/2202.12838v1
- Date: Fri, 25 Feb 2022 17:27:26 GMT
- Title: RELMOBNET: A Robust Two-Stage End-To-End Training Approach For
MOBILENETV3 Based Relative Camera Pose Estimation
- Authors: Praveen Kumar Rajendran, Sumit Mishra, Luiz Felipe Vecchietti, Dongsoo
Har
- Abstract summary: Relative camera pose estimation plays a pivotal role in dealing with 3D reconstruction and visual localization.
We propose a Siamese network based on MobileNetV3-Large for an end-to-end relative camera pose regression independent of camera parameters.
- Score: 0.6193838300896449
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Relative camera pose estimation plays a pivotal role in dealing with 3D
reconstruction and visual localization. To address this, we propose a Siamese
network based on MobileNetV3-Large for an end-to-end relative camera pose
regression independent of camera parameters. The proposed network uses pair of
images taken at different locations in the same scene to estimate the 3D
translation vector and rotation vector in unit quaternion. To increase the
generality of the model, rather than training it for a single scene, data for
four scenes are combined to train a single universal model to estimate the
relative pose. Further for independency of hyperparameter weighing between
translation and rotation loss is not used. Instead we use the novel two-stage
training procedure to learn the balance implicitly with faster convergence. We
compare the results obtained with the Cambridge Landmarks dataset, comprising
of different scenes, with existing CNN-based regression methods as baselines,
e.g., RPNet and RCPNet. The findings indicate that, when compared to RCPNet,
proposed model improves the estimation of the translation vector by a
percentage change of 16.11%, 28.88%, 52.27% on the Kings College, Old Hospital,
St Marys Church scenes from Cambridge Landmarks dataset, respectively.
Related papers
- No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images.
Our model achieves real-time 3D Gaussian reconstruction during inference.
This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z) - iComMa: Inverting 3D Gaussian Splatting for Camera Pose Estimation via Comparing and Matching [14.737266480464156]
We present a method named iComMa to address the 6D camera pose estimation problem in computer vision.
We propose an efficient method for accurate camera pose estimation by inverting 3D Gaussian Splatting (3DGS)
arXiv Detail & Related papers (2023-12-14T15:31:33Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - SACReg: Scene-Agnostic Coordinate Regression for Visual Localization [16.866303169903237]
We propose a generalized SCR model trained once in new test scenes, regardless of their scale, without any finetuning.
Instead of encoding the scene coordinates into the network weights, our model takes as input a database image with some sparse 2D pixel to 3D coordinate annotations.
We show that the database representation of images and their 2D-3D annotations can be highly compressed with negligible loss of localization performance.
arXiv Detail & Related papers (2023-07-21T16:56:36Z) - RelPose++: Recovering 6D Poses from Sparse-view Observations [66.6922660401558]
We address the task of estimating 6D camera poses from sparse-view image sets (2-8 images)
We build on the recent RelPose framework which learns a network that infers distributions over relative rotations over image pairs.
Our final system results in large improvements in 6D pose prediction over prior art on both seen and unseen object categories.
arXiv Detail & Related papers (2023-05-08T17:59:58Z) - Leveraging Image Matching Toward End-to-End Relative Camera Pose Regression [13.233301155616616]
This paper proposes a generalizable, end-to-end deep learning-based method for relative pose regression between two images.
Inspired by the classical pipeline, our method leverages Image Matching (IM) as a pre-trained task for relative pose regression.
We evaluate our method on several datasets and show that it outperforms previous end-to-end methods.
arXiv Detail & Related papers (2022-11-27T22:01:47Z) - Camera Calibration through Camera Projection Loss [4.36572039512405]
We propose a novel method to predict intrinsic (focal length and principal point offset) parameters using an image pair.
Unlike existing methods, we proposed a new representation that incorporates camera model equations as a neural network in multi-task learning framework.
Our proposed approach achieves better performance with respect to both deep learning-based and traditional methods on 7 out of 10 parameters evaluated.
arXiv Detail & Related papers (2021-10-07T14:03:10Z) - Uncertainty-Aware Camera Pose Estimation from Points and Lines [101.03675842534415]
Perspective-n-Point-and-Line (Pn$PL) aims at fast, accurate and robust camera localizations with respect to a 3D model from 2D-3D feature coordinates.
arXiv Detail & Related papers (2021-07-08T15:19:36Z) - Wide-Baseline Relative Camera Pose Estimation with Directional Learning [46.21836501895394]
We introduce DirectionNet, which estimates discrete distributions over the 5D relative pose space using a novel parameterization to make the estimation problem tractable.
We evaluate our model on challenging synthetic and real pose estimation datasets constructed from Matterport3D and InteriorNet.
arXiv Detail & Related papers (2021-06-07T04:46:09Z) - Robust 2D/3D Vehicle Parsing in CVIS [54.825777404511605]
We present a novel approach to robustly detect and perceive vehicles in different camera views as part of a cooperative vehicle-infrastructure system (CVIS)
Our formulation is designed for arbitrary camera views and makes no assumptions about intrinsic or extrinsic parameters.
In practice, our approach outperforms SOTA methods on 2D detection, instance segmentation, and 6-DoF pose estimation.
arXiv Detail & Related papers (2021-03-11T03:35:05Z) - 6D Camera Relocalization in Ambiguous Scenes via Continuous Multimodal
Inference [67.70859730448473]
We present a multimodal camera relocalization framework that captures ambiguities and uncertainties.
We predict multiple camera pose hypotheses as well as the respective uncertainty for each prediction.
We introduce a new dataset specifically designed to foster camera localization research in ambiguous environments.
arXiv Detail & Related papers (2020-04-09T20:55:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.