A Simple and Efficient Multi-task Network for 3D Object Detection and
Road Understanding
- URL: http://arxiv.org/abs/2103.04056v1
- Date: Sat, 6 Mar 2021 08:00:26 GMT
- Title: A Simple and Efficient Multi-task Network for 3D Object Detection and
Road Understanding
- Authors: Di Feng, Yiyang Zhou, Chenfeng Xu, Masayoshi Tomizuka, Wei Zhan
- Abstract summary: We show that it is possible to perform all perception tasks via a simple and efficient multi-task network.
Our proposed network, LidarMTL, takes raw LiDAR point cloud as inputs, and predicts six perception outputs for 3D object detection and road understanding.
- Score: 20.878931360708343
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detecting dynamic objects and predicting static road information such as
drivable areas and ground heights are crucial for safe autonomous driving.
Previous works studied each perception task separately, and lacked a collective
quantitative analysis. In this work, we show that it is possible to perform all
perception tasks via a simple and efficient multi-task network. Our proposed
network, LidarMTL, takes raw LiDAR point cloud as inputs, and predicts six
perception outputs for 3D object detection and road understanding. The network
is based on an encoder-decoder architecture with 3D sparse convolution and
deconvolution operations. Extensive experiments verify the proposed method with
competitive accuracies compared to state-of-the-art object detectors and other
task-specific networks. LidarMTL is also leveraged for online localization.
Code and pre-trained model have been made available at
https://github.com/frankfengdi/LidarMTL.
Related papers
- RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception [64.80760846124858]
This paper proposes a novel unified representation, RepVF, which harmonizes the representation of various perception tasks.
RepVF characterizes the structure of different targets in the scene through a vector field, enabling a single-head, multi-task learning model.
Building upon RepVF, we introduce RFTR, a network designed to exploit the inherent connections between different tasks.
arXiv Detail & Related papers (2024-07-15T16:25:07Z) - M&M3D: Multi-Dataset Training and Efficient Network for Multi-view 3D
Object Detection [2.5158048364984564]
I proposed a network structure for multi-view 3D object detection using camera-only data and a Bird's-Eye-View map.
My work is based on a current key challenge domain adaptation and visual data transfer.
My study utilizes 3D information as available semantic information and 2D multi-view image features blending into the visual-language transfer design.
arXiv Detail & Related papers (2023-11-02T04:28:51Z) - LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception
Network for Autonomous Driving [7.137567622606353]
We present a real-time multi-task convolutional neural network for LiDAR-based object detection, semantics, and motion segmentation.
We propose a novel Semantic Weighting and Guidance (SWAG) module to transfer semantic features for improved object detection selectively.
We achieve state-of-the-art results for two tasks, semantic and motion segmentation, and close to state-of-the-art performance for 3D object detection.
arXiv Detail & Related papers (2023-07-17T21:22:17Z) - ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds.
The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled.
The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z) - Paint and Distill: Boosting 3D Object Detection with Semantic Passing
Network [70.53093934205057]
3D object detection task from lidar or camera sensors is essential for autonomous driving.
We propose a novel semantic passing framework, named SPNet, to boost the performance of existing lidar-based 3D detection models.
arXiv Detail & Related papers (2022-07-12T12:35:34Z) - The Devil is in the Task: Exploiting Reciprocal Appearance-Localization
Features for Monocular 3D Object Detection [62.1185839286255]
Low-cost monocular 3D object detection plays a fundamental role in autonomous driving.
We introduce a Dynamic Feature Reflecting Network, named DFR-Net.
We rank 1st among all the monocular 3D object detectors in the KITTI test set.
arXiv Detail & Related papers (2021-12-28T07:31:18Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - Stereo RGB and Deeper LIDAR Based Network for 3D Object Detection [40.34710686994996]
3D object detection has become an emerging task in autonomous driving scenarios.
Previous works process 3D point clouds using either projection-based or voxel-based models.
We propose the Stereo RGB and Deeper LIDAR framework which can utilize semantic and spatial information simultaneously.
arXiv Detail & Related papers (2020-06-09T11:19:24Z) - DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes [54.239416488865565]
We propose a fast single-stage 3D object detection method for LIDAR data.
The core novelty of our method is a fast, single-pass architecture that both detects objects in 3D and estimates their shapes.
We find that our proposed method achieves state-of-the-art results by 5% on object detection in ScanNet scenes, and it gets top results by 3.4% in the Open dataset.
arXiv Detail & Related papers (2020-04-02T17:48:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.