Poses as Queries: Image-to-LiDAR Map Localization with Transformers
- URL: http://arxiv.org/abs/2305.04298v1
- Date: Sun, 7 May 2023 14:57:58 GMT
- Title: Poses as Queries: Image-to-LiDAR Map Localization with Transformers
- Authors: Jinyu Miao, Kun Jiang, Yunlong Wang, Tuopu Wen, Zhongyang Xiao, Zheng
Fu, Mengmeng Yang, Maolin Liu, Diange Yang
- Abstract summary: High-precision vehicle localization with commercial setups is a crucial technique for high-level autonomous driving tasks.
Estimate pose by finding correspondences between such cross-modal sensor data is challenging.
We propose a novel Transformer-based neural network to register 2D images into 3D LiDAR map in an end-to-end manner.
- Score: 5.704968411509063
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-precision vehicle localization with commercial setups is a crucial
technique for high-level autonomous driving tasks. Localization with a
monocular camera in LiDAR map is a newly emerged approach that achieves
promising balance between cost and accuracy, but estimating pose by finding
correspondences between such cross-modal sensor data is challenging, thereby
damaging the localization accuracy. In this paper, we address the problem by
proposing a novel Transformer-based neural network to register 2D images into
3D LiDAR map in an end-to-end manner. Poses are implicitly represented as
high-dimensional feature vectors called pose queries and can be iteratively
updated by interacting with the retrieved relevant information from cross-model
features using attention mechanism in a proposed POse Estimator Transformer
(POET) module. Moreover, we apply a multiple hypotheses aggregation method that
estimates the final poses by performing parallel optimization on multiple
randomly initialized pose queries to reduce the network uncertainty.
Comprehensive analysis and experimental results on public benchmark conclude
that the proposed image-to-LiDAR map localization network could achieve
state-of-the-art performances in challenging cross-modal localization tasks.
Related papers
- Double-Shot 3D Shape Measurement with a Dual-Branch Network [14.749887303860717]
We propose a dual-branch Convolutional Neural Network (CNN)-Transformer network (PDCNet) to process different structured light (SL) modalities.
Within PDCNet, a Transformer branch is used to capture global perception in the fringe images, while a CNN branch is designed to collect local details in the speckle images.
We show that our method can reduce fringe order ambiguity while producing high-accuracy results on a self-made dataset.
arXiv Detail & Related papers (2024-07-19T10:49:26Z) - SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning [17.99904937160487]
We introduce SCIPaD, a novel approach that incorporates spatial clues for unsupervised depth-pose joint learning.
SCIPaD achieves a reduction of 22.2% in average translation error and 34.8% in average angular error for camera pose estimation task on the KITTI Odometry dataset.
arXiv Detail & Related papers (2024-07-07T06:52:51Z) - Unleash the Potential of Image Branch for Cross-modal 3D Object
Detection [67.94357336206136]
We present a new cross-modal 3D object detector, namely UPIDet, which aims to unleash the potential of the image branch from two aspects.
First, UPIDet introduces a new 2D auxiliary task called normalized local coordinate map estimation.
Second, we discover that the representational capability of the point cloud backbone can be enhanced through the gradients backpropagated from the training objectives of the image branch.
arXiv Detail & Related papers (2023-01-22T08:26:58Z) - DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention [50.11672196146829]
3D object detection with surround-view images is an essential task for autonomous driving.
We propose DETR4D, a Transformer-based framework that explores sparse attention and direct feature query for 3D object detection in multi-view images.
arXiv Detail & Related papers (2022-12-15T14:18:47Z) - RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust
Correspondence Field Estimation and Pose Optimization [46.144194562841435]
We propose a framework based on a recurrent neural network (RNN) for object pose refinement.
The problem is formulated as a non-linear least squares problem based on the estimated correspondence field.
The correspondence field estimation and pose refinement are conducted alternatively in each iteration to recover accurate object poses.
arXiv Detail & Related papers (2022-03-24T06:24:55Z) - Robust Self-Supervised LiDAR Odometry via Representative Structure
Discovery and 3D Inherent Error Modeling [67.75095378830694]
We develop a two-stage odometry estimation network, where we obtain the ego-motion by estimating a set of sub-region transformations.
In this paper, we aim to alleviate the influence of unreliable structures in training, inference and mapping phases.
Our two-frame odometry outperforms the previous state of the arts by 16%/12% in terms of translational/rotational errors.
arXiv Detail & Related papers (2022-02-27T12:52:27Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z) - LocalTrans: A Multiscale Local Transformer Network for Cross-Resolution
Homography Estimation [52.63874513999119]
Cross-resolution image alignment is a key problem in multiscale giga photography.
Existing deep homography methods neglecting the explicit formulation of correspondences between them, which leads to degraded accuracy in cross-resolution challenges.
We propose a local transformer network embedded within a multiscale structure to explicitly learn correspondences between the multimodal inputs.
arXiv Detail & Related papers (2021-06-08T02:51:45Z) - Unsupervised Metric Relocalization Using Transform Consistency Loss [66.19479868638925]
Training networks to perform metric relocalization traditionally requires accurate image correspondences.
We propose a self-supervised solution, which exploits a key insight: localizing a query image within a map should yield the same absolute pose, regardless of the reference image used for registration.
We evaluate our framework on synthetic and real-world data, showing our approach outperforms other supervised methods when a limited amount of ground-truth information is available.
arXiv Detail & Related papers (2020-11-01T19:24:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.