A Dynamic Feature Interaction Framework for Multi-task Visual Perception
- URL: http://arxiv.org/abs/2306.05061v1
- Date: Thu, 8 Jun 2023 09:24:46 GMT
- Title: A Dynamic Feature Interaction Framework for Multi-task Visual Perception
- Authors: Yuling Xi, Hao Chen, Ning Wang, Peng Wang, Yanning Zhang, Chunhua
Shen, Yifan Liu
- Abstract summary: We devise an efficient unified framework to solve multiple common perception tasks.
These tasks include instance segmentation, semantic segmentation, monocular 3D detection, and depth estimation.
Our proposed framework, termed D2BNet, demonstrates a unique approach to parameter-efficient predictions for multi-task perception.
- Score: 100.98434079696268
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-task visual perception has a wide range of applications in scene
understanding such as autonomous driving. In this work, we devise an efficient
unified framework to solve multiple common perception tasks, including instance
segmentation, semantic segmentation, monocular 3D detection, and depth
estimation. Simply sharing the same visual feature representations for these
tasks impairs the performance of tasks, while independent task-specific feature
extractors lead to parameter redundancy and latency. Thus, we design two
feature-merge branches to learn feature basis, which can be useful to, and thus
shared by, multiple perception tasks. Then, each task takes the corresponding
feature basis as the input of the prediction task head to fulfill a specific
task. In particular, one feature merge branch is designed for instance-level
recognition the other for dense predictions. To enhance inter-branch
communication, the instance branch passes pixel-wise spatial information of
each instance to the dense branch using efficient dynamic convolution
weighting. Moreover, a simple but effective dynamic routing mechanism is
proposed to isolate task-specific features and leverage common properties among
tasks. Our proposed framework, termed D2BNet, demonstrates a unique approach to
parameter-efficient predictions for multi-task perception. In addition, as
tasks benefit from co-training with each other, our solution achieves on par
results on partially labeled settings on nuScenes and outperforms previous
works for 3D detection and depth estimation on the Cityscapes dataset with full
supervision.
Related papers
- RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception [64.80760846124858]
This paper proposes a novel unified representation, RepVF, which harmonizes the representation of various perception tasks.
RepVF characterizes the structure of different targets in the scene through a vector field, enabling a single-head, multi-task learning model.
Building upon RepVF, we introduce RFTR, a network designed to exploit the inherent connections between different tasks.
arXiv Detail & Related papers (2024-07-15T16:25:07Z) - A Point-Based Approach to Efficient LiDAR Multi-Task Perception [49.91741677556553]
PAttFormer is an efficient multi-task architecture for joint semantic segmentation and object detection in point clouds.
Unlike other LiDAR-based multi-task architectures, our proposed PAttFormer does not require separate feature encoders for task-specific point cloud representations.
Our evaluations show substantial gains from multi-task learning, improving LiDAR semantic segmentation by +1.7% in mIou and 3D object detection by +1.7% in mAP.
arXiv Detail & Related papers (2024-04-19T11:24:34Z) - EffiPerception: an Efficient Framework for Various Perception Tasks [6.1522068855729755]
EffiPerception is a framework to explore common learning patterns and increase the module.
It could achieve great accuracy robustness with relatively low memory cost under several perception tasks.
EffiPerception could show great accuracy-speed-memory overall performance increase within the four detection and segmentation tasks.
arXiv Detail & Related papers (2024-03-18T23:22:37Z) - Multi-task Learning with 3D-Aware Regularization [55.97507478913053]
We propose a structured 3D-aware regularizer which interfaces multiple tasks through the projection of features extracted from an image encoder to a shared 3D feature space.
We show that the proposed method is architecture agnostic and can be plugged into various prior multi-task backbones to improve their performance.
arXiv Detail & Related papers (2023-10-02T08:49:56Z) - Exploring Relational Context for Multi-Task Dense Prediction [76.86090370115]
We consider a multi-task environment for dense prediction tasks, represented by a common backbone and independent task-specific heads.
We explore various attention-based contexts, such as global and local, in the multi-task setting.
We propose an Adaptive Task-Relational Context module, which samples the pool of all available contexts for each task pair.
arXiv Detail & Related papers (2021-04-28T16:45:56Z) - Dynamic Feature Integration for Simultaneous Detection of Salient
Object, Edge and Skeleton [108.01007935498104]
In this paper, we solve three low-level pixel-wise vision problems, including salient object segmentation, edge detection, and skeleton extraction.
We first show some similarities shared by these tasks and then demonstrate how they can be leveraged for developing a unified framework.
arXiv Detail & Related papers (2020-04-18T11:10:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.