SeqNetVLAD vs PointNetVLAD: Image Sequence vs 3D Point Clouds for
Day-Night Place Recognition
- URL: http://arxiv.org/abs/2106.11481v1
- Date: Tue, 22 Jun 2021 02:05:32 GMT
- Title: SeqNetVLAD vs PointNetVLAD: Image Sequence vs 3D Point Clouds for
Day-Night Place Recognition
- Authors: Sourav Garg and Michael Milford
- Abstract summary: Place Recognition is a crucial capability for mobile robot localization and navigation.
Recent VPR methods based on sequential representations'' have shown promising results.
We compare a 3D point cloud based method with image sequence based methods.
- Score: 31.714928102950594
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Place Recognition is a crucial capability for mobile robot localization and
navigation. Image-based or Visual Place Recognition (VPR) is a challenging
problem as scene appearance and camera viewpoint can change significantly when
places are revisited. Recent VPR methods based on ``sequential
representations'' have shown promising results as compared to traditional
sequence score aggregation or single image based techniques. In parallel to
these endeavors, 3D point clouds based place recognition is also being explored
following the advances in deep learning based point cloud processing. However,
a key question remains: is an explicit 3D structure based place representation
always superior to an implicit ``spatial'' representation based on sequence of
RGB images which can inherently learn scene structure. In this extended
abstract, we attempt to compare these two types of methods by considering a
similar ``metric span'' to represent places. We compare a 3D point cloud based
method (PointNetVLAD) with image sequence based methods (SeqNet and others) and
showcase that image sequence based techniques approach, and can even surpass,
the performance achieved by point cloud based methods for a given metric span.
These performance variations can be attributed to differences in data richness
of input sensors as well as data accumulation strategies for a mobile robot.
While a perfect apple-to-apple comparison may not be feasible for these two
different modalities, the presented comparison takes a step in the direction of
answering deeper questions regarding spatial representations, relevant to
several applications like Autonomous Driving and Augmented/Virtual Reality.
Source code available publicly https://github.com/oravus/seqNet.
Related papers
- Robust 3D Point Clouds Classification based on Declarative Defenders [18.51700931775295]
3D point clouds are unstructured and sparse, while 2D images are structured and dense.
In this paper, we explore three distinct algorithms for mapping 3D point clouds into 2D images.
The proposed approaches demonstrate superior accuracy and robustness against adversarial attacks.
arXiv Detail & Related papers (2024-10-13T01:32:38Z) - Monocular Visual Place Recognition in LiDAR Maps via Cross-Modal State Space Model and Multi-View Matching [2.400446821380503]
We introduce an efficient framework to learn descriptors for both RGB images and point clouds.
It takes visual state space model (VMamba) as the backbone and employs a pixel-view-scene joint training strategy.
A visible 3D points overlap strategy is then designed to quantify the similarity between point cloud views and RGB images for multi-view supervision.
arXiv Detail & Related papers (2024-10-08T18:31:41Z) - ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition [16.799067323119644]
We introduce a fast and lightweight framework to encode images and point clouds into place-distinctive descriptors.
We propose an effective Field of View (FoV) transformation module to convert point clouds into an analogous modality as images.
We also design a non-negative factorization-based encoder to extract mutually consistent semantic features between point clouds and images.
arXiv Detail & Related papers (2024-03-27T17:01:10Z) - Self-supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration [107.61458720202984]
This paper introduces a novel self-supervised learning framework for enhancing 3D perception in autonomous driving scenes.
We propose the learnable transformation alignment to bridge the domain gap between image and point cloud data.
We establish dense 2D-3D correspondences to estimate the rigid pose.
arXiv Detail & Related papers (2024-01-23T02:41:06Z) - PosDiffNet: Positional Neural Diffusion for Point Cloud Registration in
a Large Field of View with Perturbations [27.45001809414096]
PosDiffNet is a model for point cloud registration in 3D computer vision.
We leverage a graph neural partial differential equation (PDE) based on Beltrami flow to obtain high-dimensional features.
We employ the multi-level correspondence derived from the high feature similarity scores to facilitate alignment between point clouds.
We evaluate PosDiffNet on several 3D point cloud datasets, verifying that it achieves state-of-the-art (SOTA) performance for point cloud registration in large fields of view with perturbations.
arXiv Detail & Related papers (2024-01-06T08:58:15Z) - Differentiable Registration of Images and LiDAR Point Clouds with
VoxelPoint-to-Pixel Matching [58.10418136917358]
Cross-modality registration between 2D images from cameras and 3D point clouds from LiDARs is a crucial task in computer vision and robotic training.
Previous methods estimate 2D-3D correspondences by matching point and pixel patterns learned by neural networks.
We learn a structured cross-modality matching solver to represent 3D features via a different latent pixel space.
arXiv Detail & Related papers (2023-12-07T05:46:10Z) - CheckerPose: Progressive Dense Keypoint Localization for Object Pose
Estimation with Graph Neural Network [66.24726878647543]
Estimating the 6-DoF pose of a rigid object from a single RGB image is a crucial yet challenging task.
Recent studies have shown the great potential of dense correspondence-based solutions.
We propose a novel pose estimation algorithm named CheckerPose, which improves on three main aspects.
arXiv Detail & Related papers (2023-03-29T17:30:53Z) - RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in
Autonomous Driving [80.14669385741202]
Vision transformers (ViTs) have achieved state-of-the-art results in many image-based benchmarks.
ViTs are notoriously hard to train and require a lot of training data to learn powerful representations.
We show that our method, called RangeViT, outperforms existing projection-based methods on nuScenes and Semantic KITTI.
arXiv Detail & Related papers (2023-01-24T18:50:48Z) - MeshLoc: Mesh-Based Visual Localization [54.731309449883284]
We explore a more flexible alternative based on dense 3D meshes that does not require features matching between database images to build the scene representation.
Surprisingly competitive results can be obtained when extracting features on renderings of these meshes, without any neural rendering stage.
Our results show that dense 3D model-based representations are a promising alternative to existing representations and point to interesting and challenging directions for future research.
arXiv Detail & Related papers (2022-07-21T21:21:10Z) - DeepI2P: Image-to-Point Cloud Registration via Deep Classification [71.3121124994105]
DeepI2P is a novel approach for cross-modality registration between an image and a point cloud.
Our method estimates the relative rigid transformation between the coordinate frames of the camera and Lidar.
We circumvent the difficulty by converting the registration problem into a classification and inverse camera projection optimization problem.
arXiv Detail & Related papers (2021-04-08T04:27:32Z) - Robust Place Recognition using an Imaging Lidar [45.37172889338924]
We propose a methodology for robust, real-time place recognition using an imaging lidar.
Our method is truly-invariant and can tackle reverse revisiting and upside-down revisiting.
arXiv Detail & Related papers (2021-03-03T01:08:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.