TSP-Transformer: Task-Specific Prompts Boosted Transformer for Holistic
Scene Understanding
- URL: http://arxiv.org/abs/2311.03427v1
- Date: Mon, 6 Nov 2023 18:20:02 GMT
- Title: TSP-Transformer: Task-Specific Prompts Boosted Transformer for Holistic
Scene Understanding
- Authors: Shuo Wang, Jing Li, Zibo Zhao, Dongze Lian, Binbin Huang, Xiaomei
Wang, Zhengxin Li, Shenghua Gao
- Abstract summary: We propose a Task-Specific Prompts Transformer, dubbed TSP-Transformer, for holistic scene understanding.
It features a vanilla transformer in the early stage and tasks-specific prompts transformer encoder in the lateral stage, where tasks-specific prompts are augmented.
Experiments on NYUD-v2 and PASCAL-Context show that our method achieves state-of-the-art performance.
- Score: 38.40969494998194
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Holistic scene understanding includes semantic segmentation, surface normal
estimation, object boundary detection, depth estimation, etc. The key aspect of
this problem is to learn representation effectively, as each subtask builds
upon not only correlated but also distinct attributes. Inspired by
visual-prompt tuning, we propose a Task-Specific Prompts Transformer, dubbed
TSP-Transformer, for holistic scene understanding. It features a vanilla
transformer in the early stage and tasks-specific prompts transformer encoder
in the lateral stage, where tasks-specific prompts are augmented. By doing so,
the transformer layer learns the generic information from the shared parts and
is endowed with task-specific capacity. First, the tasks-specific prompts serve
as induced priors for each task effectively. Moreover, the task-specific
prompts can be seen as switches to favor task-specific representation learning
for different tasks. Extensive experiments on NYUD-v2 and PASCAL-Context show
that our method achieves state-of-the-art performance, validating the
effectiveness of our method for holistic scene understanding. We also provide
our code in the following link https://github.com/tb2-sy/TSP-Transformer.
Related papers
- DOCTR: Disentangled Object-Centric Transformer for Point Scene Understanding [7.470587868134298]
Point scene understanding is a challenging task to process real-world scene point cloud.
Recent state-of-the-art method first segments each object and then processes them independently with multiple stages for the different sub-tasks.
We propose a novel Disentangled Object-Centric TRansformer (DOCTR) that explores object-centric representation.
arXiv Detail & Related papers (2024-03-25T05:22:34Z) - TransPrompt v2: A Transferable Prompting Framework for Cross-task Text
Classification [37.824031151922604]
We propose TransPrompt v2, a novel transferable prompting framework for few-shot learning across similar or distant text classification tasks.
For learning across similar tasks, we employ a multi-task meta-knowledge acquisition (MMA) procedure to train a meta-learner.
For learning across distant tasks, we inject the task type descriptions into the prompt, and capture the intra-type and inter-type prompt embeddings.
arXiv Detail & Related papers (2023-08-29T04:16:57Z) - Vision Transformer Adapters for Generalizable Multitask Learning [61.79647180647685]
We introduce the first multitasking vision transformer adapters that learn generalizable task affinities.
Our adapters can simultaneously solve multiple dense vision tasks in a parameter-efficient manner.
In contrast to concurrent methods, we do not require retraining or fine-tuning whenever a new task or domain is added.
arXiv Detail & Related papers (2023-08-23T18:40:48Z) - PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers
using Synthetic Scene Data [85.48684148629634]
We propose an approach to leverage synthetic scene data for improving video understanding.
We present a multi-task prompt learning approach for video transformers.
We show strong performance improvements on multiple video understanding tasks and datasets.
arXiv Detail & Related papers (2022-12-08T18:55:31Z) - Multitask Vision-Language Prompt Tuning [103.5967011236282]
We propose multitask vision-language prompt tuning (MV)
MV incorporates cross-task knowledge into prompt tuning for vision-language models.
Results in 20 vision tasks demonstrate that the proposed approach outperforms all single-task baseline prompt tuning methods.
arXiv Detail & Related papers (2022-11-21T18:41:44Z) - Multi-Task Learning with Multi-Query Transformer for Dense Prediction [38.476408482050815]
We propose a simple pipeline named Multi-Query Transformer (MQTransformer) to facilitate the reasoning among multiple tasks.
Instead of modeling the dense per-pixel context among different tasks, we seek a task-specific proxy to perform cross-task reasoning.
Experiment results show that the proposed method is an effective approach and achieves state-of-the-art results.
arXiv Detail & Related papers (2022-05-28T06:51:10Z) - Vector-Quantized Input-Contextualized Soft Prompts for Natural Language
Understanding [62.45760673220339]
We propose a novel way of prompting, Vector-quantized Input-contextualized Prompt Tuning or VIP.
Over a wide range of natural language understanding tasks, our proposed VIP framework beats the PT model by a margin of 1.19%.
arXiv Detail & Related papers (2022-05-23T03:51:27Z) - MulT: An End-to-End Multitask Learning Transformer [66.52419626048115]
We propose an end-to-end Multitask Learning Transformer framework, named MulT, to simultaneously learn multiple high-level vision tasks.
Our framework encodes the input image into a shared representation and makes predictions for each vision task using task-specific transformer-based decoder heads.
arXiv Detail & Related papers (2022-05-17T13:03:18Z) - Zero-shot Learning by Generating Task-specific Adapters [38.452434222367515]
We introduce Hypter, a framework that improves zero-shot transferability by training a hypernetwork to generate task-specific adapters from task descriptions.
This formulation enables learning at task level, and greatly reduces the number of parameters by using light-weight adapters.
arXiv Detail & Related papers (2021-01-02T10:50:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.