Multi-task Learning for Real-time Autonomous Driving Leveraging
Task-adaptive Attention Generator
- URL: http://arxiv.org/abs/2403.03468v1
- Date: Wed, 6 Mar 2024 05:04:40 GMT
- Title: Multi-task Learning for Real-time Autonomous Driving Leveraging
Task-adaptive Attention Generator
- Authors: Wonhyeok Choi, Mingyu Shin, Hyukzae Lee, Jaehoon Cho, Jaehyeon Park,
Sunghoon Im
- Abstract summary: We present a new real-time multi-task network adept at three vital autonomous driving tasks: monocular 3D object detection, semantic segmentation, and dense depth estimation.
To counter the challenge of negative transfer, which is the prevalent issue in multi-task learning, we introduce a task-adaptive attention generator.
Our rigorously optimized network, when tested on the Cityscapes-3D datasets, consistently outperforms various baseline models.
- Score: 15.94714567272497
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Real-time processing is crucial in autonomous driving systems due to the
imperative of instantaneous decision-making and rapid response. In real-world
scenarios, autonomous vehicles are continuously tasked with interpreting their
surroundings, analyzing intricate sensor data, and making decisions within
split seconds to ensure safety through numerous computer vision tasks. In this
paper, we present a new real-time multi-task network adept at three vital
autonomous driving tasks: monocular 3D object detection, semantic segmentation,
and dense depth estimation. To counter the challenge of negative transfer,
which is the prevalent issue in multi-task learning, we introduce a
task-adaptive attention generator. This generator is designed to automatically
discern interrelations across the three tasks and arrange the task-sharing
pattern, all while leveraging the efficiency of the hard-parameter sharing
approach. To the best of our knowledge, the proposed model is pioneering in its
capability to concurrently handle multiple tasks, notably 3D object detection,
while maintaining real-time processing speeds. Our rigorously optimized
network, when tested on the Cityscapes-3D datasets, consistently outperforms
various baseline models. Moreover, an in-depth ablation study substantiates the
efficacy of the methodologies integrated into our framework.
Related papers
- RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception [64.80760846124858]
This paper proposes a novel unified representation, RepVF, which harmonizes the representation of various perception tasks.
RepVF characterizes the structure of different targets in the scene through a vector field, enabling a single-head, multi-task learning model.
Building upon RepVF, we introduce RFTR, a network designed to exploit the inherent connections between different tasks.
arXiv Detail & Related papers (2024-07-15T16:25:07Z) - Cross-Cluster Shifting for Efficient and Effective 3D Object Detection
in Autonomous Driving [69.20604395205248]
We present a new 3D point-based detector model, named Shift-SSD, for precise 3D object detection in autonomous driving.
We introduce an intriguing Cross-Cluster Shifting operation to unleash the representation capacity of the point-based detector.
We conduct extensive experiments on the KITTI, runtime, and nuScenes datasets, and the results demonstrate the state-of-the-art performance of Shift-SSD.
arXiv Detail & Related papers (2024-03-10T10:36:32Z) - 3D Object Visibility Prediction in Autonomous Driving [6.802572869909114]
We present a novel attribute and its corresponding algorithm: 3D object visibility.
Our proposal of this attribute and its computational strategy aims to expand the capabilities for downstream tasks.
arXiv Detail & Related papers (2024-03-06T13:07:42Z) - LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception
Network for Autonomous Driving [7.137567622606353]
We present a real-time multi-task convolutional neural network for LiDAR-based object detection, semantics, and motion segmentation.
We propose a novel Semantic Weighting and Guidance (SWAG) module to transfer semantic features for improved object detection selectively.
We achieve state-of-the-art results for two tasks, semantic and motion segmentation, and close to state-of-the-art performance for 3D object detection.
arXiv Detail & Related papers (2023-07-17T21:22:17Z) - Visual Exemplar Driven Task-Prompting for Unified Perception in
Autonomous Driving [100.3848723827869]
We present an effective multi-task framework, VE-Prompt, which introduces visual exemplars via task-specific prompting.
Specifically, we generate visual exemplars based on bounding boxes and color-based markers, which provide accurate visual appearances of target categories.
We bridge transformer-based encoders and convolutional layers for efficient and accurate unified perception in autonomous driving.
arXiv Detail & Related papers (2023-03-03T08:54:06Z) - Multitask Network for Joint Object Detection, Semantic Segmentation and
Human Pose Estimation in Vehicle Occupancy Monitoring [0.0]
Multitask Detection, neural Pose and Estimation Network (DSPM)
We propose our Multitask Detection, neural Pose and Estimation Network (DSPM)
Our architecture allows a flexible combination of the three mentioned tasks during a simple end-to-end training.
We perform comprehensive evaluations on the public datasets SVIRO and TiCaM in order to demonstrate the superior performance.
arXiv Detail & Related papers (2022-05-03T14:11:18Z) - A Spatio-Temporal Multilayer Perceptron for Gesture Recognition [70.34489104710366]
We propose a multilayer state-weighted perceptron for gesture recognition in the context of autonomous vehicles.
An evaluation of TCG and Drive&Act datasets is provided to showcase the promising performance of our approach.
We deploy our model to our autonomous vehicle to show its real-time capability and stable execution.
arXiv Detail & Related papers (2022-04-25T08:42:47Z) - Achieving Real-Time LiDAR 3D Object Detection on a Mobile Device [53.323878851563414]
We propose a compiler-aware unified framework incorporating network enhancement and pruning search with the reinforcement learning techniques.
Specifically, a generator Recurrent Neural Network (RNN) is employed to provide the unified scheme for both network enhancement and pruning search automatically.
The proposed framework achieves real-time 3D object detection on mobile devices with competitive detection performance.
arXiv Detail & Related papers (2020-12-26T19:41:15Z) - Anomaly Detection in Video via Self-Supervised and Multi-Task Learning [113.81927544121625]
Anomaly detection in video is a challenging computer vision problem.
In this paper, we approach anomalous event detection in video through self-supervised and multi-task learning at the object level.
arXiv Detail & Related papers (2020-11-15T10:21:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.