Multi-View Fusion of Sensor Data for Improved Perception and Prediction
in Autonomous Driving
- URL: http://arxiv.org/abs/2008.11901v2
- Date: Tue, 19 Oct 2021 00:36:07 GMT
- Title: Multi-View Fusion of Sensor Data for Improved Perception and Prediction
in Autonomous Driving
- Authors: Sudeep Fadadu, Shreyash Pandey, Darshan Hegde, Yi Shi, Fang-Chieh
Chou, Nemanja Djuric, Carlos Vallespi-Gonzalez
- Abstract summary: We present an end-to-end method for object detection and trajectory prediction utilizing multi-view representations of LiDAR and camera images.
Our model builds on a state-of-the-art Bird's-Eye View (BEV) network that fuses voxelized features from a sequence of historical LiDAR data.
We extend this model with additional LiDAR Range-View (RV) features that use the raw LiDAR information in its native, non-quantized representation.
- Score: 11.312620949473938
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an end-to-end method for object detection and trajectory
prediction utilizing multi-view representations of LiDAR returns and camera
images. In this work, we recognize the strengths and weaknesses of different
view representations, and we propose an efficient and generic fusing method
that aggregates benefits from all views. Our model builds on a state-of-the-art
Bird's-Eye View (BEV) network that fuses voxelized features from a sequence of
historical LiDAR data as well as rasterized high-definition map to perform
detection and prediction tasks. We extend this model with additional LiDAR
Range-View (RV) features that use the raw LiDAR information in its native,
non-quantized representation. The RV feature map is projected into BEV and
fused with the BEV features computed from LiDAR and high-definition map. The
fused features are then further processed to output the final detections and
trajectories, within a single end-to-end trainable network. In addition, the RV
fusion of LiDAR and camera is performed in a straightforward and
computationally efficient manner using this framework. The proposed multi-view
fusion approach improves the state-of-the-art on proprietary large-scale
real-world data collected by a fleet of self-driving vehicles, as well as on
the public nuScenes data set with minimal increases on the computational cost.
Related papers
- Conformal Trajectory Prediction with Multi-View Data Integration in Cooperative Driving [4.628774934971078]
Current research on trajectory prediction primarily relies on data collected by onboard sensors of an ego vehicle.
We introduce V2INet, a novel trajectory prediction framework designed to model multi-view data by extending existing single-view models.
Our results demonstrate superior performance in terms of Final Displacement Error (FDE) and Miss Rate (MR) using a single GPU.
arXiv Detail & Related papers (2024-08-01T08:32:03Z) - OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - Rethinking Range View Representation for LiDAR Segmentation [66.73116059734788]
"Many-to-one" mapping, semantic incoherence, and shape deformation are possible impediments against effective learning from range view projections.
We present RangeFormer, a full-cycle framework comprising novel designs across network architecture, data augmentation, and post-processing.
We show that, for the first time, a range view method is able to surpass the point, voxel, and multi-view fusion counterparts in the competing LiDAR semantic and panoptic segmentation benchmarks.
arXiv Detail & Related papers (2023-03-09T16:13:27Z) - CVTNet: A Cross-View Transformer Network for Place Recognition Using
LiDAR Data [15.144590078316252]
We propose a cross-view transformer-based network, dubbedBITNet, to fuse the range image views (RIVs) and bird's eye views (BEVs) generated from the LiDAR data.
We evaluate our approach on three datasets collected with different sensor setups and environmental conditions.
arXiv Detail & Related papers (2023-02-03T11:37:20Z) - BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud
Pre-training in Autonomous Driving Scenarios [51.285561119993105]
We present BEV-MAE, an efficient masked autoencoder pre-training framework for LiDAR-based 3D object detection in autonomous driving.
Specifically, we propose a bird's eye view (BEV) guided masking strategy to guide the 3D encoder learning feature representation.
We introduce a learnable point token to maintain a consistent receptive field size of the 3D encoder.
arXiv Detail & Related papers (2022-12-12T08:15:03Z) - Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view.
Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - BEVerse: Unified Perception and Prediction in Birds-Eye-View for
Vision-Centric Autonomous Driving [92.05963633802979]
We present BEVerse, a unified framework for 3D perception and prediction based on multi-camera systems.
We show that the multi-task BEVerse outperforms single-task methods on 3D object detection, semantic map construction, and motion prediction.
arXiv Detail & Related papers (2022-05-19T17:55:35Z) - Mapping LiDAR and Camera Measurements in a Dual Top-View Grid
Representation Tailored for Automated Vehicles [3.337790639927531]
We present a generic evidential grid mapping pipeline designed for imaging sensors such as LiDARs and cameras.
Our grid-based evidential model contains semantic estimates for cell occupancy and ground separately.
Our method estimates cell occupancy robustly and with a high level of detail while maximizing efficiency and minimizing the dependency to external processing modules.
arXiv Detail & Related papers (2022-04-16T23:51:20Z) - MVFuseNet: Improving End-to-End Object Detection and Motion Forecasting
through Multi-View Fusion of LiDAR Data [4.8061970432391785]
We propose itMVFusenet, a novel end-to-end method for joint object detection motion forecasting from a temporal sequence of LiDAR data.
We show the benefits of our multi-view approach for the tasks of detection and motion forecasting on two large-scale self-driving data sets.
arXiv Detail & Related papers (2021-04-21T21:29:08Z) - RV-FuseNet: Range View Based Fusion of Time-Series LiDAR Data for Joint
3D Object Detection and Motion Forecasting [13.544498422625448]
We present RV-FuseNet, a novel end-to-end approach for joint detection and trajectory estimation.
Instead of the widely used bird's eye view (BEV) representation, we utilize the native range view (RV) representation of LiDAR data.
We show that our approach significantly improves motion forecasting performance over the existing state-of-the-art.
arXiv Detail & Related papers (2020-05-21T19:22:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.