Sense Less, Generate More: Pre-training LiDAR Perception with Masked Autoencoders for Ultra-Efficient 3D Sensing
- URL: http://arxiv.org/abs/2406.07833v1
- Date: Wed, 12 Jun 2024 03:02:54 GMT
- Title: Sense Less, Generate More: Pre-training LiDAR Perception with Masked Autoencoders for Ultra-Efficient 3D Sensing
- Authors: Sina Tayebati, Theja Tulabandhula, Amit R. Trivedi,
- Abstract summary: We propose a disruptively frugal LiDAR perception dataflow that generates rather than senses parts of the environment that are either predictable based on the extensive training of the environment or have limited consequence to the overall prediction accuracy.
Our proposed generative pre-training strategy for this purpose, called as radially masked autoencoding (R-MAE), can also be readily implemented in a typical LiDAR system by selectively activating and controlling the laser power for randomly generated angular regions during on-field operations.
- Score: 0.6340101348986665
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we propose a disruptively frugal LiDAR perception dataflow that generates rather than senses parts of the environment that are either predictable based on the extensive training of the environment or have limited consequence to the overall prediction accuracy. Therefore, the proposed methodology trades off sensing energy with training data for low-power robotics and autonomous navigation to operate frugally with sensors, extending their lifetime on a single battery charge. Our proposed generative pre-training strategy for this purpose, called as radially masked autoencoding (R-MAE), can also be readily implemented in a typical LiDAR system by selectively activating and controlling the laser power for randomly generated angular regions during on-field operations. Our extensive evaluations show that pre-training with R-MAE enables focusing on the radial segments of the data, thereby capturing spatial relationships and distances between objects more effectively than conventional procedures. Therefore, the proposed methodology not only reduces sensing energy but also improves prediction accuracy. For example, our extensive evaluations on Waymo, nuScenes, and KITTI datasets show that the approach achieves over a 5% average precision improvement in detection tasks across datasets and over a 4% accuracy improvement in transferring domains from Waymo and nuScenes to KITTI. In 3D object detection, it enhances small object detection by up to 4.37% in AP at moderate difficulty levels in the KITTI dataset. Even with 90% radial masking, it surpasses baseline models by up to 5.59% in mAP/mAPH across all object classes in the Waymo dataset. Additionally, our method achieves up to 3.17% and 2.31% improvements in mAP and NDS, respectively, on the nuScenes dataset, demonstrating its effectiveness with both single and fused LiDAR-camera modalities. https://github.com/sinatayebati/Radial_MAE.
Related papers
- Finetuning Pre-trained Model with Limited Data for LiDAR-based 3D Object Detection by Bridging Domain Gaps [8.897884780881535]
LiDAR-based 3D object detectors often fail to adapt well to target domains with different sensor configurations.
Recent studies suggest that pre-trained backbones can be learned in a self-supervised manner with large-scale unlabeled LiDAR frames.
We propose a novel method, called Domain Adaptive Distill-Tuning (DADT), to adapt a pre-trained model with limited target data.
arXiv Detail & Related papers (2024-10-02T08:22:42Z) - TraIL-Det: Transformation-Invariant Local Feature Networks for 3D LiDAR Object Detection with Unsupervised Pre-Training [21.56675189346088]
We introduce Transformation-Invariant Local (TraIL) features and the associated TraIL-Det architecture.
TraIL features exhibit rigid transformation invariance and effectively adapt to variations in point density.
They utilize the inherent isotropic radiation of LiDAR to enhance local representation.
Our method outperforms contemporary self-supervised 3D object detection approaches in terms of mAP on KITTI.
arXiv Detail & Related papers (2024-08-25T17:59:17Z) - Quanv4EO: Empowering Earth Observation by means of Quanvolutional Neural Networks [62.12107686529827]
This article highlights a significant shift towards leveraging quantum computing techniques in processing large volumes of remote sensing data.
The proposed Quanv4EO model introduces a quanvolution method for preprocessing multi-dimensional EO data.
Key findings suggest that the proposed model not only maintains high precision in image classification but also shows improvements of around 5% in EO use cases.
arXiv Detail & Related papers (2024-07-24T09:11:34Z) - Continuously Optimizing Radar Placement with Model Predictive Path Integrals [16.148347437965683]
Continuously optimizing sensor placement is essential for precise target localization in various military and civilian applications.
We employ a range measurement model that incorporates radar parameters and radar-target distance.
We visualize the evolving geometry of radars and targets over time, highlighting areas of highest measurement information gain.
arXiv Detail & Related papers (2024-05-29T11:25:53Z) - SIRST-5K: Exploring Massive Negatives Synthesis with Self-supervised
Learning for Robust Infrared Small Target Detection [53.19618419772467]
Single-frame infrared small target (SIRST) detection aims to recognize small targets from clutter backgrounds.
With the development of Transformer, the scale of SIRST models is constantly increasing.
With a rich diversity of infrared small target data, our algorithm significantly improves the model performance and convergence speed.
arXiv Detail & Related papers (2024-03-08T16:14:54Z) - Unsupervised Domain Adaptation for Self-Driving from Past Traversal
Features [69.47588461101925]
We propose a method to adapt 3D object detectors to new driving environments.
Our approach enhances LiDAR-based detection models using spatial quantized historical features.
Experiments on real-world datasets demonstrate significant improvements.
arXiv Detail & Related papers (2023-09-21T15:00:31Z) - Monocular 3D Object Detection with LiDAR Guided Semi Supervised Active
Learning [2.16117348324501]
We propose a novel semi-supervised active learning (SSAL) framework for monocular 3D object detection with LiDAR guidance (MonoLiG)
We utilize LiDAR to guide the data selection and training of monocular 3D detectors without introducing any overhead in the inference phase.
Our training strategy attains the top place in KITTI 3D and birds-eye-view (BEV) monocular object detection official benchmarks by improving the BEV Average Precision (AP) by 2.02.
arXiv Detail & Related papers (2023-07-17T11:55:27Z) - Improving LiDAR 3D Object Detection via Range-based Point Cloud Density
Optimization [13.727464375608765]
Existing 3D object detectors tend to perform well on the point cloud regions closer to the LiDAR sensor as opposed to on regions that are farther away.
We observe that there is a learning bias in detection models towards the dense objects near the sensor and show that the detection performance can be improved by simply manipulating the input point cloud density at different distance ranges.
arXiv Detail & Related papers (2023-06-09T04:11:43Z) - MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based
Self-Supervised Pre-Training [58.07391711548269]
Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training.
Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training.
arXiv Detail & Related papers (2023-03-23T17:59:02Z) - Collaborative Learning with a Drone Orchestrator [79.75113006257872]
A swarm of intelligent wireless devices train a shared neural network model with the help of a drone.
The proposed framework achieves a significant speedup in training, leading to an average 24% and 87% saving in the drone hovering time.
arXiv Detail & Related papers (2023-03-03T23:46:25Z) - SelfVoxeLO: Self-supervised LiDAR Odometry with Voxel-based Deep Neural
Networks [81.64530401885476]
We propose a self-supervised LiDAR odometry method, dubbed SelfVoxeLO, to tackle these two difficulties.
Specifically, we propose a 3D convolution network to process the raw LiDAR data directly, which extracts features that better encode the 3D geometric patterns.
We evaluate our method's performances on two large-scale datasets, i.e., KITTI and Apollo-SouthBay.
arXiv Detail & Related papers (2020-10-19T09:23:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.