On Deep Learning for Geometric and Semantic Scene Understanding Using On-Vehicle 3D LiDAR
- URL: http://arxiv.org/abs/2411.00600v1
- Date: Fri, 01 Nov 2024 14:01:54 GMT
- Title: On Deep Learning for Geometric and Semantic Scene Understanding Using On-Vehicle 3D LiDAR
- Authors: Li Li,
- Abstract summary: 3D LiDAR point cloud data is crucial for scene perception in computer vision, robotics, and autonomous driving.
We present DurLAR, the first high-fidelity 128-channel 3D LiDAR dataset featuring panoramic ambient (near infrared) and reflectivity imagery.
To improve the segmentation accuracy, we introduce Range-Aware Pointwise Distance Distribution (RAPiD) features and the associated RAPiD-Seg architecture.
- Score: 4.606106768645647
- License:
- Abstract: 3D LiDAR point cloud data is crucial for scene perception in computer vision, robotics, and autonomous driving. Geometric and semantic scene understanding, involving 3D point clouds, is essential for advancing autonomous driving technologies. However, significant challenges remain, particularly in improving the overall accuracy (e.g., segmentation accuracy, depth estimation accuracy, etc.) and efficiency of these systems. To address the challenge in terms of accuracy related to LiDAR-based tasks, we present DurLAR, the first high-fidelity 128-channel 3D LiDAR dataset featuring panoramic ambient (near infrared) and reflectivity imagery. To improve efficiency in 3D segmentation while ensuring the accuracy, we propose a novel pipeline that employs a smaller architecture, requiring fewer ground-truth annotations while achieving superior segmentation accuracy compared to contemporary approaches. To improve the segmentation accuracy, we introduce Range-Aware Pointwise Distance Distribution (RAPiD) features and the associated RAPiD-Seg architecture. All contributions have been accepted by peer-reviewed conferences, underscoring the advancements in both accuracy and efficiency in 3D LiDAR applications for autonomous driving. Full abstract: https://etheses.dur.ac.uk/15738/.
Related papers
- Robust 3D Semantic Occupancy Prediction with Calibration-free Spatial Transformation [32.50849425431012]
For autonomous cars equipped with multi-camera and LiDAR, it is critical to aggregate multi-sensor information into a unified 3D space for accurate and robust predictions.
Recent methods are mainly built on the 2D-to-3D transformation that relies on sensor calibration to project the 2D image information into the 3D space.
In this work, we propose a calibration-free spatial transformation based on vanilla attention to implicitly model the spatial correspondence.
arXiv Detail & Related papers (2024-11-19T02:40:42Z) - RAPiD-Seg: Range-Aware Pointwise Distance Distribution Networks for 3D LiDAR Segmentation [22.877384781595556]
We introduce Range-Aware Pointwise Distance Distribution (RAPiD) features and the associated RAPiD-Seg architecture.
RAPiD features exhibit rigid transformation invariance and effectively adapt to variations in point density.
We propose a double-nested autoencoder structure with a novel class-aware embedding objective to encode high-dimensional features into manageable voxel-wise embeddings.
arXiv Detail & Related papers (2024-07-14T10:59:34Z) - 4D Contrastive Superflows are Dense 3D Representation Learners [62.433137130087445]
We introduce SuperFlow, a novel framework designed to harness consecutive LiDAR-camera pairs for establishing pretraining objectives.
To further boost learning efficiency, we incorporate a plug-and-play view consistency module that enhances alignment of the knowledge distilled from camera views.
arXiv Detail & Related papers (2024-07-08T17:59:54Z) - Talk to Parallel LiDARs: A Human-LiDAR Interaction Method Based on 3D Visual Grounding [16.01111155569546]
We introduce the 3D visual grounding task into parallel LiDARs and present a novel human-computer interaction paradigm for LiDAR systems.
We propose Talk2LiDAR, a large-scale benchmark dataset tailored for 3D visual grounding in autonomous driving.
Our experiments on Talk2Car-3D and Talk2LiDAR datasets demonstrate the superior performance of BEVing.
arXiv Detail & Related papers (2024-05-24T07:00:45Z) - Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving [58.16024314532443]
We introduce LaserMix++, a framework that integrates laser beam manipulations from disparate LiDAR scans and incorporates LiDAR-camera correspondences to assist data-efficient learning.
Results demonstrate that LaserMix++ outperforms fully supervised alternatives, achieving comparable accuracy with five times fewer annotations.
This substantial advancement underscores the potential of semi-supervised approaches in reducing the reliance on extensive labeled data in LiDAR-based 3D scene understanding systems.
arXiv Detail & Related papers (2024-05-08T17:59:53Z) - Unsupervised Domain Adaptation for Self-Driving from Past Traversal
Features [69.47588461101925]
We propose a method to adapt 3D object detectors to new driving environments.
Our approach enhances LiDAR-based detection models using spatial quantized historical features.
Experiments on real-world datasets demonstrate significant improvements.
arXiv Detail & Related papers (2023-09-21T15:00:31Z) - OCTraN: 3D Occupancy Convolutional Transformer Network in Unstructured
Traffic Scenarios [0.0]
We propose OCTraN, a transformer architecture that uses iterative-attention to convert 2D image features into 3D occupancy features.
We also develop a self-supervised training pipeline to generalize the model to any scene by eliminating the need for LiDAR ground truth.
arXiv Detail & Related papers (2023-07-20T15:06:44Z) - LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception Network for Autonomous Driving [12.713417063678335]
We present a real-time multi-task convolutional neural network for LiDAR-based object detection, semantics, and motion segmentation.
We propose a novel Semantic Weighting and Guidance (SWAG) module to transfer semantic features for improved object detection selectively.
We achieve state-of-the-art results for two tasks, semantic and motion segmentation, and close to state-of-the-art performance for 3D object detection.
arXiv Detail & Related papers (2023-07-17T21:22:17Z) - Language-Guided 3D Object Detection in Point Cloud for Autonomous
Driving [91.91552963872596]
We propose a new multi-modal visual grounding task, termed LiDAR Grounding.
It jointly learns the LiDAR-based object detector with the language features and predicts the targeted region directly from the detector.
Our work offers a deeper insight into the LiDAR-based grounding task and we expect it presents a promising direction for the autonomous driving community.
arXiv Detail & Related papers (2023-05-25T06:22:10Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.