TAVP: Task-Adaptive Visual Prompt for Cross-domain Few-shot Segmentation
- URL: http://arxiv.org/abs/2409.05393v2
- Date: Sat, 28 Dec 2024 09:34:11 GMT
- Title: TAVP: Task-Adaptive Visual Prompt for Cross-domain Few-shot Segmentation
- Authors: Jiaqi Yang, Yaning Zhang, Jingxi Hu, Xiangjian He, Linlin Shen, Guoping Qiu,
- Abstract summary: We propose a task-adaptive auto-visual prompt framework for Cross-dominan Few-shot segmentation (CD-FSS)
We incorporate a Class Domain Task-Adaptive Auto-Prompt (CDTAP) module to enable class-domain feature extraction and generate high-quality, learnable visual prompts.
Our model outperforms the state-of-the-art CD-FSS approach, achieving an average accuracy improvement of 1.3% in the 1-shot setting and 11.76% in the 5-shot setting.
- Score: 40.49924427388922
- License:
- Abstract: While large visual models (LVM) demonstrated significant potential in image understanding, due to the application of large-scale pre-training, the Segment Anything Model (SAM) has also achieved great success in the field of image segmentation, supporting flexible interactive cues and strong learning capabilities. However, SAM's performance often falls short in cross-domain and few-shot applications. Previous work has performed poorly in transferring prior knowledge from base models to new applications. To tackle this issue, we propose a task-adaptive auto-visual prompt framework, a new paradigm for Cross-dominan Few-shot segmentation (CD-FSS). First, a Multi-level Feature Fusion (MFF) was used for integrated feature extraction as prior knowledge. Besides, we incorporate a Class Domain Task-Adaptive Auto-Prompt (CDTAP) module to enable class-domain agnostic feature extraction and generate high-quality, learnable visual prompts. This significant advancement uses a unique generative approach to prompts alongside a comprehensive model structure and specialized prototype computation. While ensuring that the prior knowledge of SAM is not discarded, the new branch disentangles category and domain information through prototypes, guiding it in adapting the CD-FSS. Comprehensive experiments across four cross-domain datasets demonstrate that our model outperforms the state-of-the-art CD-FSS approach, achieving an average accuracy improvement of 1.3\% in the 1-shot setting and 11.76\% in the 5-shot setting.
Related papers
- Semi-supervised Semantic Segmentation for Remote Sensing Images via Multi-scale Uncertainty Consistency and Cross-Teacher-Student Attention [59.19580789952102]
This paper proposes a novel semi-supervised Multi-Scale Uncertainty and Cross-Teacher-Student Attention (MUCA) model for RS image semantic segmentation tasks.
MUCA constrains the consistency among feature maps at different layers of the network by introducing a multi-scale uncertainty consistency regularization.
MUCA utilizes a Cross-Teacher-Student attention mechanism to guide the student network, guiding the student network to construct more discriminative feature representations.
arXiv Detail & Related papers (2025-01-18T11:57:20Z) - Prompting Segment Anything Model with Domain-Adaptive Prototype for Generalizable Medical Image Segmentation [49.5901368256326]
We propose a novel Domain-Adaptive Prompt framework for fine-tuning the Segment Anything Model (termed as DAPSAM) in segmenting medical images.
Our DAPSAM achieves state-of-the-art performance on two medical image segmentation tasks with different modalities.
arXiv Detail & Related papers (2024-09-19T07:28:33Z) - EMPL: A novel Efficient Meta Prompt Learning Framework for Few-shot Unsupervised Domain Adaptation [22.586094394391747]
We propose a novel Efficient Meta Prompt Learning Framework for FS-UDA.
Within this framework, we use pre-trained CLIP model as the feature learning base model.
Our method has the large improvement of at least 15.4% on 5-way 1-shot and 8.7% on 5-way 5-shot, compared with the state-of-the-art methods.
arXiv Detail & Related papers (2024-07-04T17:13:06Z) - APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation [33.90244697752314]
We introduce APSeg, a novel auto-prompt network for cross-domain few-shot semantic segmentation (CD-FSS)
Our model outperforms the state-of-the-art CD-FSS method by 5.24% and 3.10% in average accuracy on 1-shot and 5-shot settings, respectively.
arXiv Detail & Related papers (2024-06-12T16:20:58Z) - Adapt Before Comparison: A New Perspective on Cross-Domain Few-Shot Segmentation [0.0]
Cross-domain few-shot segmentation (CD-FSS) has emerged.
We show test-time task-adaption is the key for successful CD-FSS.
Despite our self-restriction not to use any images other than the few labeled samples at test time, we achieve new state-of-the-art performance in CD-FSS.
arXiv Detail & Related papers (2024-02-27T15:43:53Z) - Boosting Few-Shot Segmentation via Instance-Aware Data Augmentation and
Local Consensus Guided Cross Attention [7.939095881813804]
Few-shot segmentation aims to train a segmentation model that can fast adapt to a novel task for which only a few annotated images are provided.
We introduce an instance-aware data augmentation (IDA) strategy that augments the support images based on the relative sizes of the target objects.
The proposed IDA effectively increases the support set's diversity and promotes the distribution consistency between support and query images.
arXiv Detail & Related papers (2024-01-18T10:29:10Z) - Adaptive FSS: A Novel Few-Shot Segmentation Framework via Prototype
Enhancement [6.197356908000006]
The Few-Shot (FSS) aims to accomplish the novel class segmentation task with a few annotated images.
We propose a novel framework based on the adapter mechanism, namely Adaptive FSS, which can efficiently adapt the existing FSS model to the novel classes.
Our approach is compatible with diverse FSS methods with different backbones by simply inserting PAM between the layers of the encoder.
arXiv Detail & Related papers (2023-12-25T14:03:38Z) - Disentangled Feature Representation for Few-shot Image Classification [64.40410801469106]
We propose a novel Disentangled Feature Representation framework, dubbed DFR, for few-shot learning applications.
DFR can adaptively decouple the discriminative features that are modeled by the classification branch, from the class-irrelevant component of the variation branch.
In general, most of the popular deep few-shot learning methods can be plugged in as the classification branch, thus DFR can boost their performance on various few-shot tasks.
arXiv Detail & Related papers (2021-09-26T09:53:11Z) - Boosting Few-shot Semantic Segmentation with Transformers [81.43459055197435]
TRansformer-based Few-shot Semantic segmentation method (TRFS)
Our model consists of two modules: Global Enhancement Module (GEM) and Local Enhancement Module (LEM)
arXiv Detail & Related papers (2021-08-04T20:09:21Z) - SCNet: Enhancing Few-Shot Semantic Segmentation by Self-Contrastive
Background Prototypes [56.387647750094466]
Few-shot semantic segmentation aims to segment novel-class objects in a query image with only a few annotated examples.
Most of advanced solutions exploit a metric learning framework that performs segmentation through matching each pixel to a learned foreground prototype.
This framework suffers from biased classification due to incomplete construction of sample pairs with the foreground prototype only.
arXiv Detail & Related papers (2021-04-19T11:21:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.