BridgeNet: Comprehensive and Effective Feature Interactions via Bridge Feature for Multi-task Dense Predictions
- URL: http://arxiv.org/abs/2312.13514v2
- Date: Sat, 23 Nov 2024 05:48:25 GMT
- Title: BridgeNet: Comprehensive and Effective Feature Interactions via Bridge Feature for Multi-task Dense Predictions
- Authors: Jingdong Zhang, Jiayuan Fan, Peng Ye, Bo Zhang, Hancheng Ye, Baopu Li, Yancheng Cai, Tao Chen,
- Abstract summary: Multi-task dense prediction aims at handling multiple pixel-wise prediction tasks within a unified network simultaneously for visual scene understanding.
To tackle these under-explored issues, we propose a novel BridgeNet framework, which extracts comprehensive and discriminative intermediate Bridge Features.
To the best of our knowledge, this is the first work considering the completeness and quality of feature participants in cross-task interactions.
- Score: 29.049866510120093
- License:
- Abstract: Multi-task dense prediction aims at handling multiple pixel-wise prediction tasks within a unified network simultaneously for visual scene understanding. However, cross-task feature interactions of current methods are still suffering from incomplete levels of representations, less discriminative semantics in feature participants, and inefficient pair-wise task interaction processes. To tackle these under-explored issues, we propose a novel BridgeNet framework, which extracts comprehensive and discriminative intermediate Bridge Features, and conducts interactions based on them. Specifically, a Task Pattern Propagation (TPP) module is firstly applied to ensure highly semantic task-specific feature participants are prepared for subsequent interactions, and a Bridge Feature Extractor (BFE) is specially designed to selectively integrate both high-level and low-level representations to generate the comprehensive bridge features. Then, instead of conducting heavy pair-wise cross-task interactions, a Task-Feature Refiner (TFR) is developed to efficiently take guidance from bridge features and form final task predictions. To the best of our knowledge, this is the first work considering the completeness and quality of feature participants in cross-task interactions. Extensive experiments are conducted on NYUD-v2, Cityscapes and PASCAL Context benchmarks, and the superior performance shows the proposed architecture is effective and powerful in promoting different dense prediction tasks simultaneously.
Related papers
- Task Indicating Transformer for Task-conditional Dense Predictions [16.92067246179703]
We introduce a novel task-conditional framework called Task Indicating Transformer (TIT) to tackle this challenge.
Our approach designs a Mix Task Adapter module within the transformer block, which incorporates a Task Indicating Matrix through matrix decomposition.
We also propose a Task Gate Decoder module that harnesses a Task Indicating Vector and gating mechanism to facilitate adaptive multi-scale feature refinement.
arXiv Detail & Related papers (2024-03-01T07:06:57Z) - ULTRA-DP: Unifying Graph Pre-training with Multi-task Graph Dual Prompt [67.8934749027315]
We propose a unified framework for graph hybrid pre-training which injects the task identification and position identification into GNNs.
We also propose a novel pre-training paradigm based on a group of $k$-nearest neighbors.
arXiv Detail & Related papers (2023-10-23T12:11:13Z) - Contrastive Multi-Task Dense Prediction [11.227696986100447]
A core objective in design is how to effectively model cross-task interactions to achieve a comprehensive improvement on different tasks.
We introduce feature-wise contrastive consistency into modeling the cross-task interactions for multi-task dense prediction.
We propose a novel multi-task contrastive regularization method based on the consistency to effectively boost the representation learning of the different sub-tasks.
arXiv Detail & Related papers (2023-07-16T03:54:01Z) - A Dynamic Feature Interaction Framework for Multi-task Visual Perception [100.98434079696268]
We devise an efficient unified framework to solve multiple common perception tasks.
These tasks include instance segmentation, semantic segmentation, monocular 3D detection, and depth estimation.
Our proposed framework, termed D2BNet, demonstrates a unique approach to parameter-efficient predictions for multi-task perception.
arXiv Detail & Related papers (2023-06-08T09:24:46Z) - A Hierarchical Interactive Network for Joint Span-based Aspect-Sentiment
Analysis [34.1489054082536]
We propose a hierarchical interactive network (HI-ASA) to model two-way interactions between two tasks appropriately.
We use cross-stitch mechanism to combine the different task-specific features selectively as the input to ensure proper two-way interactions.
Experiments on three real-world datasets demonstrate HI-ASA's superiority over baselines.
arXiv Detail & Related papers (2022-08-24T03:03:49Z) - Fast Inference and Transfer of Compositional Task Structures for
Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph.
Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks.
Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z) - Exploring Relational Context for Multi-Task Dense Prediction [76.86090370115]
We consider a multi-task environment for dense prediction tasks, represented by a common backbone and independent task-specific heads.
We explore various attention-based contexts, such as global and local, in the multi-task setting.
We propose an Adaptive Task-Relational Context module, which samples the pool of all available contexts for each task pair.
arXiv Detail & Related papers (2021-04-28T16:45:56Z) - Multi-Task Learning for Dense Prediction Tasks: A Survey [87.66280582034838]
Multi-task learning (MTL) techniques have shown promising results w.r.t. performance, computations and/or memory footprint.
We provide a well-rounded view on state-of-the-art deep learning approaches for MTL in computer vision.
arXiv Detail & Related papers (2020-04-28T09:15:50Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.