Unsupervised Simultaneous Learning for Camera Re-Localization and Depth
Estimation from Video
- URL: http://arxiv.org/abs/2203.12804v1
- Date: Thu, 24 Mar 2022 02:11:03 GMT
- Title: Unsupervised Simultaneous Learning for Camera Re-Localization and Depth
Estimation from Video
- Authors: Shun Taguchi and Noriaki Hirose
- Abstract summary: We present an unsupervised simultaneous learning framework for the task of monocular camera re-localization and depth estimation from unlabeled video sequences.
In our framework, we train two networks that estimate the scene coordinates using directions and the depth map from each image which are then combined to estimate the camera pose.
Our method also outperforms state-of-the-art monocular depth estimation in a trained environment.
- Score: 4.5307040147072275
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an unsupervised simultaneous learning framework for the task of
monocular camera re-localization and depth estimation from unlabeled video
sequences. Monocular camera re-localization refers to the task of estimating
the absolute camera pose from an instance image in a known environment, which
has been intensively studied for alternative localization in GPS-denied
environments. In recent works, camera re-localization methods are trained via
supervised learning from pairs of camera images and camera poses. In contrast
to previous works, we propose a completely unsupervised learning framework for
camera re-localization and depth estimation, requiring only monocular video
sequences for training. In our framework, we train two networks that estimate
the scene coordinates using directions and the depth map from each image which
are then combined to estimate the camera pose. The networks can be trained
through the minimization of loss functions based on our loop closed view
synthesis. In experiments with the 7-scenes dataset, the proposed method
outperformed the re-localization of the state-of-the-art visual SLAM,
ORB-SLAM3. Our method also outperforms state-of-the-art monocular depth
estimation in a trained environment.
Related papers
- SRPose: Two-view Relative Pose Estimation with Sparse Keypoints [51.49105161103385]
SRPose is a sparse keypoint-based framework for two-view relative pose estimation in camera-to-world and object-to-camera scenarios.
It achieves competitive or superior performance compared to state-of-the-art methods in terms of accuracy and speed.
It is robust to different image sizes and camera intrinsics, and can be deployed with low computing resources.
arXiv Detail & Related papers (2024-07-11T05:46:35Z) - FlowCam: Training Generalizable 3D Radiance Fields without Camera Poses
via Pixel-Aligned Scene Flow [26.528667940013598]
Reconstruction of 3D neural fields from posed images has emerged as a promising method for self-supervised representation learning.
Key challenge preventing the deployment of these 3D scene learners on large-scale video data is their dependence on precise camera poses from structure-from-motion.
We propose a method that jointly reconstructs camera poses and 3D neural scene representations online and in a single forward pass.
arXiv Detail & Related papers (2023-05-31T20:58:46Z) - Enhanced Stable View Synthesis [86.69338893753886]
We introduce an approach to enhance the novel view synthesis from images taken from a freely moving camera.
The introduced approach focuses on outdoor scenes where recovering accurate geometric scaffold and camera pose is challenging.
arXiv Detail & Related papers (2023-03-30T01:53:14Z) - Visual Localization via Few-Shot Scene Region Classification [84.34083435501094]
Visual (re)localization addresses the problem of estimating the 6-DoF camera pose of a query image captured in a known scene.
Recent advances in structure-based localization solve this problem by memorizing the mapping from image pixels to scene coordinates.
We propose a scene region classification approach to achieve fast and effective scene memorization with few-shot images.
arXiv Detail & Related papers (2022-08-14T22:39:02Z) - ImPosIng: Implicit Pose Encoding for Efficient Camera Pose Estimation [2.6808541153140077]
Implicit Pose.
(ImPosing) embeds images and camera poses into a common latent representation with 2 separate neural networks.
By evaluating candidates through the latent space in a hierarchical manner, the camera position and orientation are not directly regressed but refined.
arXiv Detail & Related papers (2022-05-05T13:33:25Z) - SelfTune: Metrically Scaled Monocular Depth Estimation through
Self-Supervised Learning [53.78813049373321]
We propose a self-supervised learning method for the pre-trained supervised monocular depth networks to enable metrically scaled depth estimation.
Our approach is useful for various applications such as mobile robot navigation and is applicable to diverse environments.
arXiv Detail & Related papers (2022-03-10T12:28:42Z) - Continual Learning for Image-Based Camera Localization [14.47046413243358]
We study the problem of visual localization in a continual learning setup.
Our results show that similar to the classification domain, non-stationary data induces catastrophic forgetting in deep networks for visual localization.
We propose a new sampling method based on coverage score (Buff-CS) that adapts the existing sampling strategies in the buffering process to the problem of visual localization.
arXiv Detail & Related papers (2021-08-20T11:18:05Z) - Moving SLAM: Fully Unsupervised Deep Learning in Non-Rigid Scenes [85.56602190773684]
We build on the idea of view synthesis, which uses classical camera geometry to re-render a source image from a different point-of-view.
By minimizing the error between the synthetic image and the corresponding real image in a video, the deep network that predicts pose and depth can be trained completely unsupervised.
arXiv Detail & Related papers (2021-05-05T17:08:10Z) - 6D Camera Relocalization in Ambiguous Scenes via Continuous Multimodal
Inference [67.70859730448473]
We present a multimodal camera relocalization framework that captures ambiguities and uncertainties.
We predict multiple camera pose hypotheses as well as the respective uncertainty for each prediction.
We introduce a new dataset specifically designed to foster camera localization research in ambiguous environments.
arXiv Detail & Related papers (2020-04-09T20:55:06Z) - Dual-Triplet Metric Learning for Unsupervised Domain Adaptation in
Video-Based Face Recognition [8.220945563455848]
A new deep domain adaptation (DA) method is proposed to adapt the CNN embedding of a Siamese network using unlabeled tracklets captured with a new video cameras.
The proposed metric learning technique is used to train deep Siamese networks under different training scenarios.
arXiv Detail & Related papers (2020-02-11T05:06:30Z) - Unsupervised Learning of Camera Pose with Compositional Re-estimation [10.251550038802343]
Given an input video sequence, our goal is to estimate the camera pose (i.e. the camera motion) between consecutive frames.
We propose an alternative approach that utilizes a compositional re-estimation process for camera pose estimation.
Our approach significantly improves the predicted camera motion both quantitatively and visually.
arXiv Detail & Related papers (2020-01-17T18:59:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.