Robust Neural Routing Through Space Partitions for Camera Relocalization
in Dynamic Indoor Environments
- URL: http://arxiv.org/abs/2012.04746v2
- Date: Fri, 2 Apr 2021 22:28:33 GMT
- Title: Robust Neural Routing Through Space Partitions for Camera Relocalization
in Dynamic Indoor Environments
- Authors: Siyan Dong, Qingnan Fan, He Wang, Ji Shi, Li Yi, Thomas Funkhouser,
Baoquan Chen, Leonidas Guibas
- Abstract summary: Localizing the camera in a known indoor environment is a key building block for scene mapping, robot navigation, AR, etc.
Recent advances estimate the camera pose via optimization over the 2D/3D-3D correspondences established between the coordinates in 2D/3D camera space and 3D world space.
We propose a novel outlier-aware neural tree which bridges the two worlds, deep learning and decision tree approaches.
- Score: 39.99342226556908
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Localizing the camera in a known indoor environment is a key building block
for scene mapping, robot navigation, AR, etc. Recent advances estimate the
camera pose via optimization over the 2D/3D-3D correspondences established
between the coordinates in 2D/3D camera space and 3D world space. Such a
mapping is estimated with either a convolution neural network or a decision
tree using only the static input image sequence, which makes these approaches
vulnerable to dynamic indoor environments that are quite common yet challenging
in the real world. To address the aforementioned issues, in this paper, we
propose a novel outlier-aware neural tree which bridges the two worlds, deep
learning and decision tree approaches. It builds on three important blocks: (a)
a hierarchical space partition over the indoor scene to construct the decision
tree; (b) a neural routing function, implemented as a deep classification
network, employed for better 3D scene understanding; and (c) an outlier
rejection module used to filter out dynamic points during the hierarchical
routing process. Our proposed algorithm is evaluated on the RIO-10 benchmark
developed for camera relocalization in dynamic indoor environments. It achieves
robust neural routing through space partitions and outperforms the
state-of-the-art approaches by around 30% on camera pose accuracy, while
running comparably fast for evaluation.
Related papers
- SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning [17.99904937160487]
We introduce SCIPaD, a novel approach that incorporates spatial clues for unsupervised depth-pose joint learning.
SCIPaD achieves a reduction of 22.2% in average translation error and 34.8% in average angular error for camera pose estimation task on the KITTI Odometry dataset.
arXiv Detail & Related papers (2024-07-07T06:52:51Z) - Improved Scene Landmark Detection for Camera Localization [11.56648898250606]
Method based on scene landmark detection (SLD) was recently proposed to address these limitations.
It involves training a convolutional neural network (CNN) to detect a few predetermined, salient, scene-specific 3D points or landmarks.
We show that the accuracy gap was due to insufficient model capacity and noisy labels during training.
arXiv Detail & Related papers (2024-01-31T18:59:12Z) - ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic
Reconstruction [62.599588577671796]
We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames.
Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality.
arXiv Detail & Related papers (2023-11-29T20:30:18Z) - R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras [106.52409577316389]
R3D3 is a multi-camera system for dense 3D reconstruction and ego-motion estimation.
Our approach exploits spatial-temporal information from multiple cameras, and monocular depth refinement.
We show that this design enables a dense, consistent 3D reconstruction of challenging, dynamic outdoor environments.
arXiv Detail & Related papers (2023-08-28T17:13:49Z) - Neural Voting Field for Camera-Space 3D Hand Pose Estimation [106.34750803910714]
We present a unified framework for camera-space 3D hand pose estimation from a single RGB image based on 3D implicit representation.
We propose a novel unified 3D dense regression scheme to estimate camera-space 3D hand pose via dense 3D point-wise voting in camera frustum.
arXiv Detail & Related papers (2023-05-07T16:51:34Z) - Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - 3D Human Pose Estimation in Multi-View Operating Room Videos Using
Differentiable Camera Projections [2.486571221735935]
We propose to directly optimise for localisation in 3D by training 2D CNNs end-to-end based on a 3D loss.
Using videos from the MVOR dataset, we show that this end-to-end approach outperforms optimisation in 2D space.
arXiv Detail & Related papers (2022-10-21T09:00:02Z) - Graph Attention Network for Camera Relocalization on Dynamic Scenes [1.0398909602421018]
We devise a graph attention network-based approach for learning a scene triangle mesh representation in order to estimate an image camera position in a dynamic environment.
Our approach significantly improves the camera pose accuracy of the state-of-the-art method from $0.358$ to $0.506$ on the RIO10 benchmark for dynamic indoor camera relocalization.
arXiv Detail & Related papers (2022-09-29T18:57:52Z) - VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild
Environment [80.77351380961264]
We present an approach to estimate 3D poses of multiple people from multiple camera views.
We present an end-to-end solution which operates in the $3$D space, therefore avoids making incorrect decisions in the 2D space.
We propose Pose Regression Network (PRN) to estimate a detailed 3D pose for each proposal.
arXiv Detail & Related papers (2020-04-13T23:50:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.