Prompt-Matched Semantic Segmentation
- URL: http://arxiv.org/abs/2208.10159v1
- Date: Mon, 22 Aug 2022 09:12:53 GMT
- Title: Prompt-Matched Semantic Segmentation
- Authors: Lingbo Liu, Bruce X.B. Yu, Jianlong Chang, Qi Tian, Chang-Wen Chen
- Abstract summary: The objective of this work is to explore how to effectively adapt pre-trained foundation models to various downstream tasks of image semantic segmentation.
We propose a novel Inter-Stage Prompt-Matched Framework, which maintains the original structure of the foundation model while generating visual prompts adaptively for task-oriented tuning.
A lightweight module termed Semantic-aware Prompt Matcher is then introduced to hierarchically interpolate between two stages to learn reasonable prompts for each specific task.
- Score: 96.99924127527002
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The objective of this work is to explore how to effectively and efficiently
adapt pre-trained foundation models to various downstream tasks of image
semantic segmentation. Conventional methods usually fine-tuned the whole
networks for each specific dataset and it was burdensome to store the massive
parameters of these networks. A few recent works attempted to insert some
trainable parameters into the frozen network to learn visual prompts for
efficient tuning. However, these works significantly modified the original
structure of standard modules, making them inoperable on many existing
high-speed inference devices, where standard modules and their parameters have
been embedded. To facilitate prompt-based semantic segmentation, we propose a
novel Inter-Stage Prompt-Matched Framework, which maintains the original
structure of the foundation model while generating visual prompts adaptively
for task-oriented tuning. Specifically, the pre-trained model is first divided
into multiple stages, and their parameters are frozen and shared for all
semantic segmentation tasks. A lightweight module termed Semantic-aware Prompt
Matcher is then introduced to hierarchically interpolate between two stages to
learn reasonable prompts for each specific task under the guidance of interim
semantic maps. In this way, we can better stimulate the pre-trained knowledge
of the frozen model to learn semantic concepts effectively on downstream
datasets. Extensive experiments conducted on five benchmarks show that the
proposed method can achieve a promising trade-off between parameter efficiency
and performance effectiveness.
Related papers
- Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach [87.8330887605381]
We show how to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters.
We synthesize a task-specific query with a learnable and lightweight module, which is independent of the pre-trained model.
Our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
arXiv Detail & Related papers (2024-07-09T15:45:04Z) - KOPPA: Improving Prompt-based Continual Learning with Key-Query Orthogonal Projection and Prototype-based One-Versus-All [24.50129285997307]
We introduce a novel key-query learning strategy to enhance prompt matching efficiency and address the challenge of shifting features.
Our method empowers the model to achieve results surpassing those of current state-of-the-art approaches by a large margin of up to 20%.
arXiv Detail & Related papers (2023-11-26T20:35:19Z) - Distribution-Aware Prompt Tuning for Vision-Language Models [20.02599087680773]
A key to prompt tuning is the feature space alignment between two modalities via learnable vectors with model parameters fixed.
Inspired by this observation, we proposed distribution-aware prompt tuning (DAPT) for vision-language models.
Our experiments on 11 benchmark datasets demonstrate that our method significantly improves generalizability.
arXiv Detail & Related papers (2023-09-06T23:49:11Z) - Prompt Tuning for Parameter-efficient Medical Image Segmentation [79.09285179181225]
We propose and investigate several contributions to achieve a parameter-efficient but effective adaptation for semantic segmentation on two medical imaging datasets.
We pre-train this architecture with a dedicated dense self-supervision scheme based on assignments to online generated prototypes.
We demonstrate that the resulting neural network model is able to attenuate the gap between fully fine-tuned and parameter-efficiently adapted models.
arXiv Detail & Related papers (2022-11-16T21:55:05Z) - Guiding the PLMs with Semantic Anchors as Intermediate Supervision:
Towards Interpretable Semantic Parsing [57.11806632758607]
We propose to incorporate the current pretrained language models with a hierarchical decoder network.
By taking the first-principle structures as the semantic anchors, we propose two novel intermediate supervision tasks.
We conduct intensive experiments on several semantic parsing benchmarks and demonstrate that our approach can consistently outperform the baselines.
arXiv Detail & Related papers (2022-10-04T07:27:29Z) - SlimSeg: Slimmable Semantic Segmentation with Boundary Supervision [54.16430358203348]
We propose a simple but effective slimmable semantic segmentation (SlimSeg) method, which can be executed at different capacities during inference.
We show that our proposed SlimSeg with various mainstream networks can produce flexible models that provide dynamic adjustment of computational cost and better performance.
arXiv Detail & Related papers (2022-07-13T14:41:05Z) - Semantics-Depth-Symbiosis: Deeply Coupled Semi-Supervised Learning of
Semantics and Depth [83.94528876742096]
We tackle the MTL problem of two dense tasks, ie, semantic segmentation and depth estimation, and present a novel attention module called Cross-Channel Attention Module (CCAM)
In a true symbiotic spirit, we then formulate a novel data augmentation for the semantic segmentation task using predicted depth called AffineMix, and a simple depth augmentation using predicted semantics called ColorAug.
Finally, we validate the performance gain of the proposed method on the Cityscapes dataset, which helps us achieve state-of-the-art results for a semi-supervised joint model based on depth and semantic
arXiv Detail & Related papers (2022-06-21T17:40:55Z) - Unfreeze with Care: Space-Efficient Fine-Tuning of Semantic Parsing
Models [5.893781742558463]
We examine two promising techniques, prefix tuning and bias-term tuning, specifically on semantic parsing.
We compare them against each other on two different semantic parsing datasets, and we also compare them against full and partial fine-tuning, both in few-shot and conventional data settings.
While prefix tuning is shown to do poorly for semantic parsing tasks off the shelf, we modify it by adding special token embeddings, which results in very strong performance without compromising parameter savings.
arXiv Detail & Related papers (2022-03-05T04:30:03Z) - SSA: Semantic Structure Aware Inference for Weakly Pixel-Wise Dense
Predictions without Cost [36.27226683586425]
The semantic structure aware inference (SSA) is proposed to explore the semantic structure information hidden in different stages of the CNN-based network to generate high-quality CAM in the model inference.
The proposed method has the advantage of no parameters and does not need to be trained. Therefore, it can be applied to a wide range of weakly-supervised pixel-wise dense prediction tasks.
arXiv Detail & Related papers (2021-11-05T11:07:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.