S3E-GNN: Sparse Spatial Scene Embedding with Graph Neural Networks for
Camera Relocalization
- URL: http://arxiv.org/abs/2205.05861v1
- Date: Thu, 12 May 2022 03:21:45 GMT
- Title: S3E-GNN: Sparse Spatial Scene Embedding with Graph Neural Networks for
Camera Relocalization
- Authors: Ran Cheng, Xinyu Jiang, Yuan Chen, Lige Liu, Tao Sun
- Abstract summary: This paper proposes a learning-based approach, named Sparse Spatial Scene Embedding with Graph Neural Networks (S3E-GNN)
In the encoding module, a trained S3E network encodes RGB images into embedding codes to implicitly represent spatial and semantic embedding code.
With embedding codes and the associated poses obtained from a SLAM system, each image is represented as a graph node in a pose graph.
In the GNN query module, the pose graph is transformed to form a embedding-aggregated reference graph for camera relocalization.
- Score: 11.512647893596029
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Camera relocalization is the key component of simultaneous localization and
mapping (SLAM) systems. This paper proposes a learning-based approach, named
Sparse Spatial Scene Embedding with Graph Neural Networks (S3E-GNN), as an
end-to-end framework for efficient and robust camera relocalization. S3E-GNN
consists of two modules. In the encoding module, a trained S3E network encodes
RGB images into embedding codes to implicitly represent spatial and semantic
embedding code. With embedding codes and the associated poses obtained from a
SLAM system, each image is represented as a graph node in a pose graph. In the
GNN query module, the pose graph is transformed to form a embedding-aggregated
reference graph for camera relocalization. We collect various scene datasets in
the challenging environments to perform experiments. Our results demonstrate
that S3E-GNN method outperforms the traditional Bag-of-words (BoW) for camera
relocalization due to learning-based embedding and GNN powered scene matching
mechanism.
Related papers
- Leveraging Neural Radiance Field in Descriptor Synthesis for Keypoints Scene Coordinate Regression [1.2974519529978974]
This paper introduces a pipeline for keypoint descriptor synthesis using Neural Radiance Field (NeRF)
generating novel poses and feeding them into a trained NeRF model to create new views, our approach enhances the KSCR's capabilities in data-scarce environments.
The proposed system could significantly improve localization accuracy by up to 50% and cost only a fraction of time for data synthesis.
arXiv Detail & Related papers (2024-03-15T13:40:37Z) - CP-SLAM: Collaborative Neural Point-based SLAM System [54.916578456416204]
This paper presents a collaborative implicit neural localization and mapping (SLAM) system with RGB-D image sequences.
In order to enable all these modules in a unified framework, we propose a novel neural point based 3D scene representation.
A distributed-to-centralized learning strategy is proposed for the collaborative implicit SLAM to improve consistency and cooperation.
arXiv Detail & Related papers (2023-11-14T09:17:15Z) - SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and
Quasi-Planar Segmentation [53.83313235792596]
We present a new methodology for real-time semantic mapping from RGB-D sequences.
It combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping.
Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems.
arXiv Detail & Related papers (2023-06-28T22:36:44Z) - HSCNet++: Hierarchical Scene Coordinate Classification and Regression
for Visual Localization with Transformer [23.920690073252636]
We present a new hierarchical scene coordinate network to predict pixel scene coordinates in a coarse-to-fine manner from a single RGB image.
The proposed method, which is an extension of HSCNet, allows us to train compact models which scale robustly to large environments.
arXiv Detail & Related papers (2023-05-05T15:00:14Z) - Transforming Visual Scene Graphs to Image Captions [69.13204024990672]
We propose to transform Scene Graphs (TSG) into more descriptive captions.
In TSG, we apply multi-head attention (MHA) to design the Graph Neural Network (GNN) for embedding scene graphs.
In TSG, each expert is built on MHA, for discriminating the graph embeddings to generate different kinds of words.
arXiv Detail & Related papers (2023-05-03T15:18:37Z) - Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - Graph Attention Network for Camera Relocalization on Dynamic Scenes [1.0398909602421018]
We devise a graph attention network-based approach for learning a scene triangle mesh representation in order to estimate an image camera position in a dynamic environment.
Our approach significantly improves the camera pose accuracy of the state-of-the-art method from $0.358$ to $0.506$ on the RIO10 benchmark for dynamic indoor camera relocalization.
arXiv Detail & Related papers (2022-09-29T18:57:52Z) - Neural Implicit Dictionary via Mixture-of-Expert Training [111.08941206369508]
We present a generic INR framework that achieves both data and training efficiency by learning a Neural Implicit Dictionary (NID)
Our NID assembles a group of coordinate-based Impworks which are tuned to span the desired function space.
Our experiments show that, NID can improve reconstruction of 2D images or 3D scenes by 2 orders of magnitude faster with up to 98% less input data.
arXiv Detail & Related papers (2022-07-08T05:07:19Z) - Pose-GNN : Camera Pose Estimation System Using Graph Neural Networks [12.12580095956898]
We propose a novel image based localization system using graph neural networks (GNN)
The pretrained ResNet50 convolutional neural network (CNN) architecture is used to extract the important features for each image.
We show that using GNN leads to enhanced performance for both indoor and outdoor environments.
arXiv Detail & Related papers (2021-03-17T04:40:02Z) - Generalized Contrastive Optimization of Siamese Networks for Place
Recognition [10.117451511942267]
We propose a Generalized Contrastive loss function that relies on image similarity as a continuous measure, and use it to train a siamese CNN.
We demonstrate that siamese CNNs trained using the GCL function and the improved annotations consistently outperform their binary counterparts.
Our models trained on MSLS outperform the state-of-the-art methods, including NetVLAD, NetVLAD-SARE, AP-GeM and Patch-NetVLAD, and generalize well on the Pittsburgh30k, Tokyo 24/7, RobotCar Seasons v2 and Extended CMU Seasons datasets.
arXiv Detail & Related papers (2021-03-11T12:32:05Z) - Graphs, Convolutions, and Neural Networks: From Graph Filters to Graph
Neural Networks [183.97265247061847]
We leverage graph signal processing to characterize the representation space of graph neural networks (GNNs)
We discuss the role of graph convolutional filters in GNNs and show that any architecture built with such filters has the fundamental properties of permutation equivariance and stability to changes in the topology.
We also study the use of GNNs in recommender systems and learning decentralized controllers for robot swarms.
arXiv Detail & Related papers (2020-03-08T13:02:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.