TAVP: Task-Adaptive Visual Prompt for Cross-domain Few-shot Segmentation
- URL: http://arxiv.org/abs/2409.05393v1
- Date: Mon, 9 Sep 2024 07:43:58 GMT
- Title: TAVP: Task-Adaptive Visual Prompt for Cross-domain Few-shot Segmentation
- Authors: Jiaqi Yang, Ye Huang, Xiangjian He, Linlin Shen, Guoping Qiu,
- Abstract summary: This work proposes a task-adaptive prompt framework based on the Segment Anything Model (SAM)
It uses a unique generative approach to prompts alongside a comprehensive model structure and specialized prototype computation.
After task-specific and weighted guidance, the abundant feature information of SAM can be better learned for Cross-dominan few-shot segmentation.
- Score: 44.134340976905655
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Under the backdrop of large-scale pre-training, large visual models (LVM) have demonstrated significant potential in image understanding. The recent emergence of the Segment Anything Model (SAM) has brought a qualitative shift in the field of image segmentation, supporting flexible interactive cues and strong learning capabilities. However, its performance often falls short in cross-domain and few-shot applications. Transferring prior knowledge from foundation models to new applications while preserving learning capabilities is worth exploring. This work proposes a task-adaptive prompt framework based on SAM, a new paradigm for Cross-dominan few-shot segmentation (CD-FSS). First, a Multi-level Feature Fusion (MFF) was used for integrated feature extraction. Besides, an additional Class Domain Task-Adaptive Auto-Prompt (CDTAP) module was combined with the segmentation branch for class-domain agnostic feature extraction and high-quality learnable prompt production. This significant advancement uses a unique generative approach to prompts alongside a comprehensive model structure and specialized prototype computation. While ensuring that the prior knowledge of SAM is not discarded, the new branch disentangles category and domain information through prototypes, guiding it in adapting the CD-FSS. We have achieved the best results on three benchmarks compared to the recent state-of-the-art (SOTA) methods. Comprehensive experiments showed that after task-specific and weighted guidance, the abundant feature information of SAM can be better learned for CD-FSS.
Related papers
- Adapting In-Domain Few-Shot Segmentation to New Domains without Retraining [53.963279865355105]
Cross-domain few-shot segmentation (CD-FSS) aims to segment objects of novel classes in new domains.
Most CD-FSS methods redesign and retrain in-domain FSS models using various domain-generalization techniques.
We propose adapting informative model structures of the well-trained FSS model for target domains by learning domain characteristics from few-shot labeled support samples.
arXiv Detail & Related papers (2025-04-30T08:16:33Z) - Semi-supervised Semantic Segmentation for Remote Sensing Images via Multi-scale Uncertainty Consistency and Cross-Teacher-Student Attention [59.19580789952102]
This paper proposes a novel semi-supervised Multi-Scale Uncertainty and Cross-Teacher-Student Attention (MUCA) model for RS image semantic segmentation tasks.
MUCA constrains the consistency among feature maps at different layers of the network by introducing a multi-scale uncertainty consistency regularization.
MUCA utilizes a Cross-Teacher-Student attention mechanism to guide the student network, guiding the student network to construct more discriminative feature representations.
arXiv Detail & Related papers (2025-01-18T11:57:20Z) - Segment Any Class (SAC): Multi-Class Few-Shot Semantic Segmentation via Class Region Proposals [0.0]
We present Segment Any Class (SAC), a training-free approach that task-adapts SAM for Multi-class segmentation.
SAC generates Class-Region Proposals (CRP) on query images which allows us to automatically generate class-aware prompts.
SAC solely utilizes automated prompting and achieves superior results over state-of-the-art methods on the COCO-20i benchmark.
arXiv Detail & Related papers (2024-11-21T01:04:53Z) - Task Consistent Prototype Learning for Incremental Few-shot Semantic Segmentation [20.49085411104439]
Incremental Few-Shot Semantic (iFSS) tackles a task that requires a model to continually expand its segmentation capability on novel classes.
This study introduces a meta-learning-based prototype approach that encourages the model to learn how to adapt quickly while preserving previous knowledge.
Experiments on iFSS datasets built upon PASCAL and COCO benchmarks show the advanced performance of the proposed approach.
arXiv Detail & Related papers (2024-10-16T23:42:27Z) - Prompting Segment Anything Model with Domain-Adaptive Prototype for Generalizable Medical Image Segmentation [49.5901368256326]
We propose a novel Domain-Adaptive Prompt framework for fine-tuning the Segment Anything Model (termed as DAPSAM) in segmenting medical images.
Our DAPSAM achieves state-of-the-art performance on two medical image segmentation tasks with different modalities.
arXiv Detail & Related papers (2024-09-19T07:28:33Z) - Beyond Mask: Rethinking Guidance Types in Few-shot Segmentation [67.35274834837064]
We develop a universal vision-language framework (UniFSS) to integrate prompts from text, mask, box, and image.
UniFSS significantly outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2024-07-16T08:41:01Z) - EMPL: A novel Efficient Meta Prompt Learning Framework for Few-shot Unsupervised Domain Adaptation [22.586094394391747]
We propose a novel Efficient Meta Prompt Learning Framework for FS-UDA.
Within this framework, we use pre-trained CLIP model as the feature learning base model.
Our method has the large improvement of at least 15.4% on 5-way 1-shot and 8.7% on 5-way 5-shot, compared with the state-of-the-art methods.
arXiv Detail & Related papers (2024-07-04T17:13:06Z) - APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation [33.90244697752314]
We introduce APSeg, a novel auto-prompt network for cross-domain few-shot semantic segmentation (CD-FSS)
Our model outperforms the state-of-the-art CD-FSS method by 5.24% and 3.10% in average accuracy on 1-shot and 5-shot settings, respectively.
arXiv Detail & Related papers (2024-06-12T16:20:58Z) - AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning [61.666973416903005]
Segment Anything Model (SAM) has demonstrated its impressive generalization capabilities in open-world scenarios with the guidance of prompts.
We propose a novel framework, termed AlignSAM, designed for automatic prompting for aligning SAM to an open context.
arXiv Detail & Related papers (2024-06-01T16:21:39Z) - Adapt Before Comparison: A New Perspective on Cross-Domain Few-Shot Segmentation [0.0]
Cross-domain few-shot segmentation (CD-FSS) has emerged.
We show test-time task-adaption is the key for successful CD-FSS.
Despite our self-restriction not to use any images other than the few labeled samples at test time, we achieve new state-of-the-art performance in CD-FSS.
arXiv Detail & Related papers (2024-02-27T15:43:53Z) - Boosting Few-Shot Segmentation via Instance-Aware Data Augmentation and
Local Consensus Guided Cross Attention [7.939095881813804]
Few-shot segmentation aims to train a segmentation model that can fast adapt to a novel task for which only a few annotated images are provided.
We introduce an instance-aware data augmentation (IDA) strategy that augments the support images based on the relative sizes of the target objects.
The proposed IDA effectively increases the support set's diversity and promotes the distribution consistency between support and query images.
arXiv Detail & Related papers (2024-01-18T10:29:10Z) - Adaptive FSS: A Novel Few-Shot Segmentation Framework via Prototype
Enhancement [6.197356908000006]
The Few-Shot (FSS) aims to accomplish the novel class segmentation task with a few annotated images.
We propose a novel framework based on the adapter mechanism, namely Adaptive FSS, which can efficiently adapt the existing FSS model to the novel classes.
Our approach is compatible with diverse FSS methods with different backbones by simply inserting PAM between the layers of the encoder.
arXiv Detail & Related papers (2023-12-25T14:03:38Z) - Self-guided Few-shot Semantic Segmentation for Remote Sensing Imagery
Based on Large Vision Models [14.292149307183967]
This research introduces a structured framework designed for the automation of few-shot semantic segmentation.
It utilizes the SAM model and facilitates a more efficient generation of semantically discernible segmentation outcomes.
Central to our methodology is a novel automatic prompt learning approach, leveraging prior guided masks to produce coarse pixel-wise prompts for SAM.
arXiv Detail & Related papers (2023-11-22T07:07:55Z) - RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation.
Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal.
We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z) - RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation
based on Visual Foundation Model [29.42043345787285]
We propose a method to learn the generation of appropriate prompts for Segment Anything Model (SAM)
This enables SAM to produce semantically discernible segmentation results for remote sensing images.
We also propose several ongoing derivatives for instance segmentation tasks, drawing on recent advancements within the SAM community, and compare their performance with RSPrompter.
arXiv Detail & Related papers (2023-06-28T14:51:34Z) - Novel Class Discovery in Semantic Segmentation [104.30729847367104]
We introduce a new setting of Novel Class Discovery in Semantic (NCDSS)
It aims at segmenting unlabeled images containing new classes given prior knowledge from a labeled set of disjoint classes.
In NCDSS, we need to distinguish the objects and background, and to handle the existence of multiple classes within an image.
We propose the Entropy-based Uncertainty Modeling and Self-training (EUMS) framework to overcome noisy pseudo-labels.
arXiv Detail & Related papers (2021-12-03T13:31:59Z) - Disentangled Feature Representation for Few-shot Image Classification [64.40410801469106]
We propose a novel Disentangled Feature Representation framework, dubbed DFR, for few-shot learning applications.
DFR can adaptively decouple the discriminative features that are modeled by the classification branch, from the class-irrelevant component of the variation branch.
In general, most of the popular deep few-shot learning methods can be plugged in as the classification branch, thus DFR can boost their performance on various few-shot tasks.
arXiv Detail & Related papers (2021-09-26T09:53:11Z) - Boosting Few-shot Semantic Segmentation with Transformers [81.43459055197435]
TRansformer-based Few-shot Semantic segmentation method (TRFS)
Our model consists of two modules: Global Enhancement Module (GEM) and Local Enhancement Module (LEM)
arXiv Detail & Related papers (2021-08-04T20:09:21Z) - SCNet: Enhancing Few-Shot Semantic Segmentation by Self-Contrastive
Background Prototypes [56.387647750094466]
Few-shot semantic segmentation aims to segment novel-class objects in a query image with only a few annotated examples.
Most of advanced solutions exploit a metric learning framework that performs segmentation through matching each pixel to a learned foreground prototype.
This framework suffers from biased classification due to incomplete construction of sample pairs with the foreground prototype only.
arXiv Detail & Related papers (2021-04-19T11:21:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.