SGL: Structure Guidance Learning for Camera Localization
- URL: http://arxiv.org/abs/2304.05571v1
- Date: Wed, 12 Apr 2023 02:20:29 GMT
- Title: SGL: Structure Guidance Learning for Camera Localization
- Authors: Xudong Zhang, Shuang Gao, Xiaohu Nan, Haikuan Ning, Yuchen Yang,
Yishan Ping, Jixiang Wan, Shuzhou Dong, Jijunnan Li, Yandong Guo
- Abstract summary: We propose a network architecture named Structure Guidance Bundle (SGL) which utilizes the receptive branch and the structure branch to extract both high-level and low-level features.
In this work, we focus on the scene prediction ones and propose a network architecture named SGL which utilizes the receptive branch and the structure branch to extract both high-level and low-level features.
- Score: 7.094881396940598
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Camera localization is a classical computer vision task that serves various
Artificial Intelligence and Robotics applications. With the rapid developments
of Deep Neural Networks (DNNs), end-to-end visual localization methods are
prosperous in recent years. In this work, we focus on the scene coordinate
prediction ones and propose a network architecture named as Structure Guidance
Learning (SGL) which utilizes the receptive branch and the structure branch to
extract both high-level and low-level features to estimate the 3D coordinates.
We design a confidence strategy to refine and filter the predicted 3D
observations, which enables us to estimate the camera poses by employing the
Perspective-n-Point (PnP) with RANSAC. In the training part, we design the
Bundle Adjustment trainer to help the network fit the scenes better.
Comparisons with some state-of-the-art (SOTA) methods and sufficient ablation
experiments confirm the validity of our proposed architecture.
Related papers
- The NeRFect Match: Exploring NeRF Features for Visual Localization [45.42725089658268]
We propose the use of Neural Radiance Fields (NeRF) as a scene representation for visual localization.
We extend its recognized advantages by exploring the potential of NeRF's internal features in establishing precise 2D-3D matches for localization.
We introduce NeRFMatch, an advanced 2D-3D matching function that capitalizes on the internal knowledge of NeRF learned via view synthesis.
arXiv Detail & Related papers (2024-03-14T17:11:49Z) - Improved Scene Landmark Detection for Camera Localization [11.56648898250606]
Method based on scene landmark detection (SLD) was recently proposed to address these limitations.
It involves training a convolutional neural network (CNN) to detect a few predetermined, salient, scene-specific 3D points or landmarks.
We show that the accuracy gap was due to insufficient model capacity and noisy labels during training.
arXiv Detail & Related papers (2024-01-31T18:59:12Z) - Self-supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration [107.61458720202984]
This paper introduces a novel self-supervised learning framework for enhancing 3D perception in autonomous driving scenes.
We propose the learnable transformation alignment to bridge the domain gap between image and point cloud data.
We establish dense 2D-3D correspondences to estimate the rigid pose.
arXiv Detail & Related papers (2024-01-23T02:41:06Z) - OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - Perspective-aware Convolution for Monocular 3D Object Detection [2.33877878310217]
We propose a novel perspective-aware convolutional layer that captures long-range dependencies in images.
By enforcing convolutional kernels to extract features along the depth axis of every image pixel, we incorporates perspective information into network architecture.
We demonstrate improved performance on the KITTI3D dataset, achieving a 23.9% average precision in the easy benchmark.
arXiv Detail & Related papers (2023-08-24T17:25:36Z) - SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and
Quasi-Planar Segmentation [53.83313235792596]
We present a new methodology for real-time semantic mapping from RGB-D sequences.
It combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping.
Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems.
arXiv Detail & Related papers (2023-06-28T22:36:44Z) - Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - 3DVNet: Multi-View Depth Prediction and Volumetric Refinement [68.68537312256144]
3DVNet is a novel multi-view stereo (MVS) depth-prediction method.
Our key idea is the use of a 3D scene-modeling network that iteratively updates a set of coarse depth predictions.
We show that our method exceeds state-of-the-art accuracy in both depth prediction and 3D reconstruction metrics.
arXiv Detail & Related papers (2021-12-01T00:52:42Z) - PCLs: Geometry-aware Neural Reconstruction of 3D Pose with Perspective
Crop Layers [111.55817466296402]
We introduce Perspective Crop Layers (PCLs) - a form of perspective crop of the region of interest based on the camera geometry.
PCLs deterministically remove the location-dependent perspective effects while leaving end-to-end training and the number of parameters of the underlying neural network.
PCL offers an easy way to improve the accuracy of existing 3D reconstruction networks by making them geometry aware.
arXiv Detail & Related papers (2020-11-27T08:48:43Z) - 3D Scene Geometry-Aware Constraint for Camera Localization with Deep
Learning [11.599633757222406]
Recently end-to-end approaches based on convolutional neural network have been much studied to achieve or even exceed 3D-geometry based traditional methods.
In this work, we propose a compact network for absolute camera pose regression.
Inspired from those traditional methods, a 3D scene geometry-aware constraint is also introduced by exploiting all available information including motion, depth and image contents.
arXiv Detail & Related papers (2020-05-13T04:15:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.