WSCLoc: Weakly-Supervised Sparse-View Camera Relocalization
- URL: http://arxiv.org/abs/2403.15272v1
- Date: Fri, 22 Mar 2024 15:15:44 GMT
- Title: WSCLoc: Weakly-Supervised Sparse-View Camera Relocalization
- Authors: Jialu Wang, Kaichen Zhou, Andrew Markham, Niki Trigoni,
- Abstract summary: WSCLoc is a system capable of being customized to various deep learning-based relocalization models.
In the initial stage, WSCLoc employs a multilayer perceptron-based structure called WFT-NeRF to co-optimize image reconstruction quality.
In the second stage, we co-optimize the pre-trained WFT-NeRF and WFT-Pose.
- Score: 42.85368902409545
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the advancements in deep learning for camera relocalization tasks, obtaining ground truth pose labels required for the training process remains a costly endeavor. While current weakly supervised methods excel in lightweight label generation, their performance notably declines in scenarios with sparse views. In response to this challenge, we introduce WSCLoc, a system capable of being customized to various deep learning-based relocalization models to enhance their performance under weakly-supervised and sparse view conditions. This is realized with two stages. In the initial stage, WSCLoc employs a multilayer perceptron-based structure called WFT-NeRF to co-optimize image reconstruction quality and initial pose information. To ensure a stable learning process, we incorporate temporal information as input. Furthermore, instead of optimizing SE(3), we opt for $\mathfrak{sim}(3)$ optimization to explicitly enforce a scale constraint. In the second stage, we co-optimize the pre-trained WFT-NeRF and WFT-Pose. This optimization is enhanced by Time-Encoding based Random View Synthesis and supervised by inter-frame geometric constraints that consider pose, depth, and RGB information. We validate our approaches on two publicly available datasets, one outdoor and one indoor. Our experimental results demonstrate that our weakly-supervised relocalization solutions achieve superior pose estimation accuracy in sparse-view scenarios, comparable to state-of-the-art camera relocalization methods. We will make our code publicly available.
Related papers
- SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning [17.99904937160487]
We introduce SCIPaD, a novel approach that incorporates spatial clues for unsupervised depth-pose joint learning.
SCIPaD achieves a reduction of 22.2% in average translation error and 34.8% in average angular error for camera pose estimation task on the KITTI Odometry dataset.
arXiv Detail & Related papers (2024-07-07T06:52:51Z) - Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion [57.232688209606515]
We present HTCL, a novel Temporal Temporal Context Learning paradigm for improving camera-based semantic scene completion.
Our method ranks $1st$ on the Semantic KITTI benchmark and even surpasses LiDAR-based methods in terms of mIoU.
arXiv Detail & Related papers (2024-07-02T09:11:17Z) - InstantSplat: Sparse-view SfM-free Gaussian Splatting in Seconds [91.77050739918037]
Novel view synthesis (NVS) from a sparse set of images has advanced significantly in 3D computer vision.
It relies on precise initial estimation of camera parameters using Structure-from-Motion (SfM)
In this study, we introduce a novel and efficient framework to enhance robust NVS from sparse-view images.
arXiv Detail & Related papers (2024-03-29T17:29:58Z) - Leveraging Neural Radiance Field in Descriptor Synthesis for Keypoints Scene Coordinate Regression [1.2974519529978974]
This paper introduces a pipeline for keypoint descriptor synthesis using Neural Radiance Field (NeRF)
generating novel poses and feeding them into a trained NeRF model to create new views, our approach enhances the KSCR's capabilities in data-scarce environments.
The proposed system could significantly improve localization accuracy by up to 50% and cost only a fraction of time for data synthesis.
arXiv Detail & Related papers (2024-03-15T13:40:37Z) - FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting [58.41056963451056]
We propose a few-shot view synthesis framework based on 3D Gaussian Splatting.
This framework enables real-time and photo-realistic view synthesis with as few as three training views.
FSGS achieves state-of-the-art performance in both accuracy and rendering efficiency across diverse datasets.
arXiv Detail & Related papers (2023-12-01T09:30:02Z) - CLIP Brings Better Features to Visual Aesthetics Learners [12.0962117940694]
Image aesthetics assessment (IAA) is one of the ideal application scenarios for such methods due to subjective and expensive labeling procedure.
In this work, an unified and flexible two-phase textbfCLIP-based textbfSemi-supervised textbfKnowledge textbfDistillation paradigm is proposed, namely textbftextitCSKD.
arXiv Detail & Related papers (2023-07-28T16:00:21Z) - Bilevel Fast Scene Adaptation for Low-Light Image Enhancement [50.639332885989255]
Enhancing images in low-light scenes is a challenging but widely concerned task in the computer vision.
Main obstacle lies in the modeling conundrum from distribution discrepancy across different scenes.
We introduce the bilevel paradigm to model the above latent correspondence.
A bilevel learning framework is constructed to endow the scene-irrelevant generality of the encoder towards diverse scenes.
arXiv Detail & Related papers (2023-06-02T08:16:21Z) - SC-wLS: Towards Interpretable Feed-forward Camera Re-localization [29.332038781334443]
Visual re-localization aims to recover camera poses in a known environment, which is vital for applications like robotics or augmented reality.
Feed-forward absolute camera pose regression methods directly output poses by a network, but suffer from low accuracy.
We propose a feed-forward method termed SC-wLS that exploits all scene coordinate estimates for weighted least squares pose regression.
arXiv Detail & Related papers (2022-10-23T15:15:48Z) - LATITUDE: Robotic Global Localization with Truncated Dynamic Low-pass
Filter in City-scale NeRF [5.364698641882657]
We present a two-stage localization mechanism in city-scale Neural Radiance Fields (NeRF)
In place recognition stage, we train a regressor through images generated from trained NeRFs, which provides an initial value for global localization.
In pose optimization stage, we minimize the residual between the observed image and rendered image by directly optimizing the pose on tangent plane.
We evaluate our method on both synthetic and real-world data and show its potential applications for high-precision navigation in large-scale city scenes.
arXiv Detail & Related papers (2022-09-18T07:56:06Z) - Toward Fast, Flexible, and Robust Low-Light Image Enhancement [87.27326390675155]
We develop a new Self-Calibrated Illumination (SCI) learning framework for fast, flexible, and robust brightening images in real-world low-light scenarios.
Considering the computational burden of the cascaded pattern, we construct the self-calibrated module which realizes the convergence between results of each stage.
We make comprehensive explorations to SCI's inherent properties including operation-insensitive adaptability and model-irrelevant generality.
arXiv Detail & Related papers (2022-04-21T14:40:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.