MVFuseNet: Improving End-to-End Object Detection and Motion Forecasting
through Multi-View Fusion of LiDAR Data
- URL: http://arxiv.org/abs/2104.10772v1
- Date: Wed, 21 Apr 2021 21:29:08 GMT
- Title: MVFuseNet: Improving End-to-End Object Detection and Motion Forecasting
through Multi-View Fusion of LiDAR Data
- Authors: Ankit Laddha, Shivam Gautam, Stefan Palombo, Shreyash Pandey, Carlos
Vallespi-Gonzalez
- Abstract summary: We propose itMVFusenet, a novel end-to-end method for joint object detection motion forecasting from a temporal sequence of LiDAR data.
We show the benefits of our multi-view approach for the tasks of detection and motion forecasting on two large-scale self-driving data sets.
- Score: 4.8061970432391785
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this work, we propose \textit{MVFuseNet}, a novel end-to-end method for
joint object detection and motion forecasting from a temporal sequence of LiDAR
data. Most existing methods operate in a single view by projecting data in
either range view (RV) or bird's eye view (BEV). In contrast, we propose a
method that effectively utilizes both RV and BEV for spatio-temporal feature
learning as part of a temporal fusion network as well as for multi-scale
feature learning in the backbone network. Further, we propose a novel
sequential fusion approach that effectively utilizes multiple views in the
temporal fusion network. We show the benefits of our multi-view approach for
the tasks of detection and motion forecasting on two large-scale self-driving
data sets, achieving state-of-the-art results. Furthermore, we show that
MVFusenet scales well to large operating ranges while maintaining real-time
performance.
Related papers
- Conformal Trajectory Prediction with Multi-View Data Integration in Cooperative Driving [4.628774934971078]
Current research on trajectory prediction primarily relies on data collected by onboard sensors of an ego vehicle.
We introduce V2INet, a novel trajectory prediction framework designed to model multi-view data by extending existing single-view models.
Our results demonstrate superior performance in terms of Final Displacement Error (FDE) and Miss Rate (MR) using a single GPU.
arXiv Detail & Related papers (2024-08-01T08:32:03Z) - Modeling Continuous Motion for 3D Point Cloud Object Tracking [54.48716096286417]
This paper presents a novel approach that views each tracklet as a continuous stream.
At each timestamp, only the current frame is fed into the network to interact with multi-frame historical features stored in a memory bank.
To enhance the utilization of multi-frame features for robust tracking, a contrastive sequence enhancement strategy is proposed.
arXiv Detail & Related papers (2023-03-14T02:58:27Z) - Rethinking Range View Representation for LiDAR Segmentation [66.73116059734788]
"Many-to-one" mapping, semantic incoherence, and shape deformation are possible impediments against effective learning from range view projections.
We present RangeFormer, a full-cycle framework comprising novel designs across network architecture, data augmentation, and post-processing.
We show that, for the first time, a range view method is able to surpass the point, voxel, and multi-view fusion counterparts in the competing LiDAR semantic and panoptic segmentation benchmarks.
arXiv Detail & Related papers (2023-03-09T16:13:27Z) - VS-Net: Multiscale Spatiotemporal Features for Lightweight Video Salient
Document Detection [0.2578242050187029]
We propose VS-Net, which captures multi-scaletemporal information with the help of dilated depth-wise separable convolution and Approximation Rank Pooling.
Our model generates saliency maps considering both the background and foreground, making it perform better in challenging scenarios.
The immense experiments regulated on the benchmark MIDV-500 dataset show that the VS-Net model outperforms state-of-the-art approaches in both time and robustness measures.
arXiv Detail & Related papers (2023-01-11T13:07:31Z) - DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention [50.11672196146829]
3D object detection with surround-view images is an essential task for autonomous driving.
We propose DETR4D, a Transformer-based framework that explores sparse attention and direct feature query for 3D object detection in multi-view images.
arXiv Detail & Related papers (2022-12-15T14:18:47Z) - 3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D
Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds.
Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z) - BEVerse: Unified Perception and Prediction in Birds-Eye-View for
Vision-Centric Autonomous Driving [92.05963633802979]
We present BEVerse, a unified framework for 3D perception and prediction based on multi-camera systems.
We show that the multi-task BEVerse outperforms single-task methods on 3D object detection, semantic map construction, and motion prediction.
arXiv Detail & Related papers (2022-05-19T17:55:35Z) - Full-Duplex Strategy for Video Object Segmentation [141.43983376262815]
Full- Strategy Network (FSNet) is a novel framework for video object segmentation (VOS)
Our FSNet performs the crossmodal feature-passing (i.e., transmission and receiving) simultaneously before fusion decoding stage.
We show that our FSNet outperforms other state-of-the-arts for both the VOS and video salient object detection tasks.
arXiv Detail & Related papers (2021-08-06T14:50:50Z) - Multi-View Fusion of Sensor Data for Improved Perception and Prediction
in Autonomous Driving [11.312620949473938]
We present an end-to-end method for object detection and trajectory prediction utilizing multi-view representations of LiDAR and camera images.
Our model builds on a state-of-the-art Bird's-Eye View (BEV) network that fuses voxelized features from a sequence of historical LiDAR data.
We extend this model with additional LiDAR Range-View (RV) features that use the raw LiDAR information in its native, non-quantized representation.
arXiv Detail & Related papers (2020-08-27T03:32:25Z) - RV-FuseNet: Range View Based Fusion of Time-Series LiDAR Data for Joint
3D Object Detection and Motion Forecasting [13.544498422625448]
We present RV-FuseNet, a novel end-to-end approach for joint detection and trajectory estimation.
Instead of the widely used bird's eye view (BEV) representation, we utilize the native range view (RV) representation of LiDAR data.
We show that our approach significantly improves motion forecasting performance over the existing state-of-the-art.
arXiv Detail & Related papers (2020-05-21T19:22:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.