Visual Exemplar Driven Task-Prompting for Unified Perception in
Autonomous Driving
- URL: http://arxiv.org/abs/2303.01788v1
- Date: Fri, 3 Mar 2023 08:54:06 GMT
- Title: Visual Exemplar Driven Task-Prompting for Unified Perception in
Autonomous Driving
- Authors: Xiwen Liang, Minzhe Niu, Jianhua Han, Hang Xu, Chunjing Xu, Xiaodan
Liang
- Abstract summary: We present an effective multi-task framework, VE-Prompt, which introduces visual exemplars via task-specific prompting.
Specifically, we generate visual exemplars based on bounding boxes and color-based markers, which provide accurate visual appearances of target categories.
We bridge transformer-based encoders and convolutional layers for efficient and accurate unified perception in autonomous driving.
- Score: 100.3848723827869
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-task learning has emerged as a powerful paradigm to solve a range of
tasks simultaneously with good efficiency in both computation resources and
inference time. However, these algorithms are designed for different tasks
mostly not within the scope of autonomous driving, thus making it hard to
compare multi-task methods in autonomous driving. Aiming to enable the
comprehensive evaluation of present multi-task learning methods in autonomous
driving, we extensively investigate the performance of popular multi-task
methods on the large-scale driving dataset, which covers four common perception
tasks, i.e., object detection, semantic segmentation, drivable area
segmentation, and lane detection. We provide an in-depth analysis of current
multi-task learning methods under different common settings and find out that
the existing methods make progress but there is still a large performance gap
compared with single-task baselines. To alleviate this dilemma in autonomous
driving, we present an effective multi-task framework, VE-Prompt, which
introduces visual exemplars via task-specific prompting to guide the model
toward learning high-quality task-specific representations. Specifically, we
generate visual exemplars based on bounding boxes and color-based markers,
which provide accurate visual appearances of target categories and further
mitigate the performance gap. Furthermore, we bridge transformer-based encoders
and convolutional layers for efficient and accurate unified perception in
autonomous driving. Comprehensive experimental results on the diverse
self-driving dataset BDD100K show that the VE-Prompt improves the multi-task
baseline and further surpasses single-task models.
Related papers
- A Multitask Deep Learning Model for Classification and Regression of Hyperspectral Images: Application to the large-scale dataset [44.94304541427113]
We propose a multitask deep learning model to perform multiple classification and regression tasks simultaneously on hyperspectral images.
We validated our approach on a large hyperspectral dataset called TAIGA.
A comprehensive qualitative and quantitative analysis of the results shows that the proposed method significantly outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-23T11:14:54Z) - RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception [64.80760846124858]
This paper proposes a novel unified representation, RepVF, which harmonizes the representation of various perception tasks.
RepVF characterizes the structure of different targets in the scene through a vector field, enabling a single-head, multi-task learning model.
Building upon RepVF, we introduce RFTR, a network designed to exploit the inherent connections between different tasks.
arXiv Detail & Related papers (2024-07-15T16:25:07Z) - Multi-task Learning for Real-time Autonomous Driving Leveraging
Task-adaptive Attention Generator [15.94714567272497]
We present a new real-time multi-task network adept at three vital autonomous driving tasks: monocular 3D object detection, semantic segmentation, and dense depth estimation.
To counter the challenge of negative transfer, which is the prevalent issue in multi-task learning, we introduce a task-adaptive attention generator.
Our rigorously optimized network, when tested on the Cityscapes-3D datasets, consistently outperforms various baseline models.
arXiv Detail & Related papers (2024-03-06T05:04:40Z) - Distribution Matching for Multi-Task Learning of Classification Tasks: a
Large-Scale Study on Faces & Beyond [62.406687088097605]
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space.
We show that MTL can be successful with classification tasks with little, or non-overlapping annotations.
We propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching.
arXiv Detail & Related papers (2024-01-02T14:18:11Z) - LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception
Network for Autonomous Driving [7.137567622606353]
We present a real-time multi-task convolutional neural network for LiDAR-based object detection, semantics, and motion segmentation.
We propose a novel Semantic Weighting and Guidance (SWAG) module to transfer semantic features for improved object detection selectively.
We achieve state-of-the-art results for two tasks, semantic and motion segmentation, and close to state-of-the-art performance for 3D object detection.
arXiv Detail & Related papers (2023-07-17T21:22:17Z) - Multi-Task Consistency for Active Learning [18.794331424921946]
Inconsistency-based active learning has proven to be effective in selecting informative samples for annotation.
We propose a novel multi-task active learning strategy for two coupled vision tasks: object detection and semantic segmentation.
Our approach achieves 95% of the fully-trained performance using only 67% of the available data.
arXiv Detail & Related papers (2023-06-21T17:34:31Z) - A Dynamic Feature Interaction Framework for Multi-task Visual Perception [100.98434079696268]
We devise an efficient unified framework to solve multiple common perception tasks.
These tasks include instance segmentation, semantic segmentation, monocular 3D detection, and depth estimation.
Our proposed framework, termed D2BNet, demonstrates a unique approach to parameter-efficient predictions for multi-task perception.
arXiv Detail & Related papers (2023-06-08T09:24:46Z) - Exploring Relational Context for Multi-Task Dense Prediction [76.86090370115]
We consider a multi-task environment for dense prediction tasks, represented by a common backbone and independent task-specific heads.
We explore various attention-based contexts, such as global and local, in the multi-task setting.
We propose an Adaptive Task-Relational Context module, which samples the pool of all available contexts for each task pair.
arXiv Detail & Related papers (2021-04-28T16:45:56Z) - Gradient Surgery for Multi-Task Learning [119.675492088251]
Multi-task learning has emerged as a promising approach for sharing structure across multiple tasks.
The reasons why multi-task learning is so challenging compared to single-task learning are not fully understood.
We propose a form of gradient surgery that projects a task's gradient onto the normal plane of the gradient of any other task that has a conflicting gradient.
arXiv Detail & Related papers (2020-01-19T06:33:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.