Continual Learning for Image-Based Camera Localization
- URL: http://arxiv.org/abs/2108.09112v1
- Date: Fri, 20 Aug 2021 11:18:05 GMT
- Title: Continual Learning for Image-Based Camera Localization
- Authors: Shuzhe Wang and Zakaria Laskar and Iaroslav Melekhov and Xiaotian Li
and Juho Kannala
- Abstract summary: We study the problem of visual localization in a continual learning setup.
Our results show that similar to the classification domain, non-stationary data induces catastrophic forgetting in deep networks for visual localization.
We propose a new sampling method based on coverage score (Buff-CS) that adapts the existing sampling strategies in the buffering process to the problem of visual localization.
- Score: 14.47046413243358
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: For several emerging technologies such as augmented reality, autonomous
driving and robotics, visual localization is a critical component. Directly
regressing camera pose/3D scene coordinates from the input image using deep
neural networks has shown great potential. However, such methods assume a
stationary data distribution with all scenes simultaneously available during
training. In this paper, we approach the problem of visual localization in a
continual learning setup -- whereby the model is trained on scenes in an
incremental manner. Our results show that similar to the classification domain,
non-stationary data induces catastrophic forgetting in deep networks for visual
localization. To address this issue, a strong baseline based on storing and
replaying images from a fixed buffer is proposed. Furthermore, we propose a new
sampling method based on coverage score (Buff-CS) that adapts the existing
sampling strategies in the buffering process to the problem of visual
localization. Results demonstrate consistent improvements over standard
buffering methods on two challenging datasets -- 7Scenes, 12Scenes, and also
19Scenes by combining the former scenes.
Related papers
- Bilevel Fast Scene Adaptation for Low-Light Image Enhancement [50.639332885989255]
Enhancing images in low-light scenes is a challenging but widely concerned task in the computer vision.
Main obstacle lies in the modeling conundrum from distribution discrepancy across different scenes.
We introduce the bilevel paradigm to model the above latent correspondence.
A bilevel learning framework is constructed to endow the scene-irrelevant generality of the encoder towards diverse scenes.
arXiv Detail & Related papers (2023-06-02T08:16:21Z) - Visual Localization via Few-Shot Scene Region Classification [84.34083435501094]
Visual (re)localization addresses the problem of estimating the 6-DoF camera pose of a query image captured in a known scene.
Recent advances in structure-based localization solve this problem by memorizing the mapping from image pixels to scene coordinates.
We propose a scene region classification approach to achieve fast and effective scene memorization with few-shot images.
arXiv Detail & Related papers (2022-08-14T22:39:02Z) - ImPosIng: Implicit Pose Encoding for Efficient Camera Pose Estimation [2.6808541153140077]
Implicit Pose.
(ImPosing) embeds images and camera poses into a common latent representation with 2 separate neural networks.
By evaluating candidates through the latent space in a hierarchical manner, the camera position and orientation are not directly regressed but refined.
arXiv Detail & Related papers (2022-05-05T13:33:25Z) - Unsupervised Simultaneous Learning for Camera Re-Localization and Depth
Estimation from Video [4.5307040147072275]
We present an unsupervised simultaneous learning framework for the task of monocular camera re-localization and depth estimation from unlabeled video sequences.
In our framework, we train two networks that estimate the scene coordinates using directions and the depth map from each image which are then combined to estimate the camera pose.
Our method also outperforms state-of-the-art monocular depth estimation in a trained environment.
arXiv Detail & Related papers (2022-03-24T02:11:03Z) - Recognizing Actions in Videos from Unseen Viewpoints [80.6338404141284]
We show that current convolutional neural network models are unable to recognize actions from camera viewpoints not present in training data.
We introduce a new dataset for unseen view recognition and show the approaches ability to learn viewpoint invariant representations.
arXiv Detail & Related papers (2021-03-30T17:17:54Z) - Data Augmentation for Object Detection via Differentiable Neural
Rendering [71.00447761415388]
It is challenging to train a robust object detector when annotated data is scarce.
Existing approaches to tackle this problem include semi-supervised learning that interpolates labeled data from unlabeled data.
We introduce an offline data augmentation method for object detection, which semantically interpolates the training data with novel views.
arXiv Detail & Related papers (2021-03-04T06:31:06Z) - Unsupervised Metric Relocalization Using Transform Consistency Loss [66.19479868638925]
Training networks to perform metric relocalization traditionally requires accurate image correspondences.
We propose a self-supervised solution, which exploits a key insight: localizing a query image within a map should yield the same absolute pose, regardless of the reference image used for registration.
We evaluate our framework on synthetic and real-world data, showing our approach outperforms other supervised methods when a limited amount of ground-truth information is available.
arXiv Detail & Related papers (2020-11-01T19:24:27Z) - Self-supervised Video Representation Learning by Uncovering
Spatio-temporal Statistics [74.6968179473212]
This paper proposes a novel pretext task to address the self-supervised learning problem.
We compute a series of partitioning-temporal statistical summaries, such as the spatial location and dominant direction of the largest motion.
A neural network is built and trained to yield the statistical summaries given the video frames as inputs.
arXiv Detail & Related papers (2020-08-31T08:31:56Z) - Towards Dense People Detection with Deep Learning and Depth images [9.376814409561726]
This paper proposes a DNN-based system that detects multiple people from a single depth image.
Our neural network processes a depth image and outputs a likelihood map in image coordinates.
We show this strategy to be effective, producing networks that generalize to work with scenes different from those used during training.
arXiv Detail & Related papers (2020-07-14T16:43:02Z) - Adversarial Transfer of Pose Estimation Regression [11.117357750374035]
We develop a deep adaptation network for learning scene-invariant image representations and use adversarial learning to generate representations for model transfer.
We evaluate our network on two public datasets, Cambridge Landmarks and 7Scene, demonstrate its superiority over several baselines and compare to the state of the art methods.
arXiv Detail & Related papers (2020-06-20T21:16:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.