Related papers: Textual and Visual Guided Task Adaptation for Source-Free Cross-Domain Few-Shot Segmentation

Textual and Visual Guided Task Adaptation for Source-Free Cross-Domain Few-Shot Segmentation

URL: http://arxiv.org/abs/2508.05213v1
Date: Thu, 07 Aug 2025 09:48:24 GMT
Title: Textual and Visual Guided Task Adaptation for Source-Free Cross-Domain Few-Shot Segmentation
Authors: Jianming Liu, Wenlong Qiu, Haitao Wei,
Abstract summary: Few-Shot(FSS) aims to efficient segmentation of new objects with few labeled samples.<n>Cross-Domain Few-Shot(CD-FSS) is proposed to mitigate such performance degradation.
Score: 0.979247551980983
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Few-Shot Segmentation(FSS) aims to efficient segmentation of new objects with few labeled samples. However, its performance significantly degrades when domain discrepancies exist between training and deployment. Cross-Domain Few-Shot Segmentation(CD-FSS) is proposed to mitigate such performance degradation. Current CD-FSS methods primarily sought to develop segmentation models on a source domain capable of cross-domain generalization. However, driven by escalating concerns over data privacy and the imperative to minimize data transfer and training expenses, the development of source-free CD-FSS approaches has become essential. In this work, we propose a source-free CD-FSS method that leverages both textual and visual information to facilitate target domain task adaptation without requiring source domain data. Specifically, we first append Task-Specific Attention Adapters (TSAA) to the feature pyramid of a pretrained backbone, which adapt multi-level features extracted from the shared pre-trained backbone to the target task. Then, the parameters of the TSAA are trained through a Visual-Visual Embedding Alignment (VVEA) module and a Text-Visual Embedding Alignment (TVEA) module. The VVEA module utilizes global-local visual features to align image features across different views, while the TVEA module leverages textual priors from pre-aligned multi-modal features (e.g., from CLIP) to guide cross-modal adaptation. By combining the outputs of these modules through dense comparison operations and subsequent fusion via skip connections, our method produces refined prediction masks. Under both 1-shot and 5-shot settings, the proposed approach achieves average segmentation accuracy improvements of 2.18\% and 4.11\%, respectively, across four cross-domain datasets, significantly outperforming state-of-the-art CD-FSS methods. Code are available at https://github.com/ljm198134/TVGTANet.

Related papers

Adapter Naturally Serves as Decoupler for Cross-Domain Few-Shot Semantic Segmentation [14.660710170156202]
Cross-domain few-shot segmentation (CD-FSS) is proposed to pre-train the model on a source-domain dataset with sufficient samples.<n>On target domains, we freeze the model and fine-tune the DFN to learn target-specific knowledge specific.<n>Our method surpasses the state-of-the-art method in CD-FSS significantly by 2.69% and 4.68% MIoU in 1-shot and 5-shot scenarios.
arXiv Detail & Related papers (2025-06-09T02:51:06Z)
Adapting In-Domain Few-Shot Segmentation to New Domains without Retraining [53.963279865355105]
Cross-domain few-shot segmentation (CD-FSS) aims to segment objects of novel classes in new domains.<n>Most CD-FSS methods redesign and retrain in-domain FSS models using various domain-generalization techniques.<n>We propose adapting informative model structures of the well-trained FSS model for target domains by learning domain characteristics from few-shot labeled support samples.
arXiv Detail & Related papers (2025-04-30T08:16:33Z)
TAVP: Task-Adaptive Visual Prompt for Cross-domain Few-shot Segmentation [40.49924427388922]
We propose a task-adaptive auto-visual prompt framework for Cross-dominan Few-shot segmentation (CD-FSS)<n>We incorporate a Class Domain Task-Adaptive Auto-Prompt (CDTAP) module to enable class-domain feature extraction and generate high-quality, learnable visual prompts.<n>Our model outperforms the state-of-the-art CD-FSS approach, achieving an average accuracy improvement of 1.3% in the 1-shot setting and 11.76% in the 5-shot setting.
arXiv Detail & Related papers (2024-09-09T07:43:58Z)
APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation [33.90244697752314]
We introduce APSeg, a novel auto-prompt network for cross-domain few-shot semantic segmentation (CD-FSS) Our model outperforms the state-of-the-art CD-FSS method by 5.24% and 3.10% in average accuracy on 1-shot and 5-shot settings, respectively.
arXiv Detail & Related papers (2024-06-12T16:20:58Z)
DARNet: Bridging Domain Gaps in Cross-Domain Few-Shot Segmentation with Dynamic Adaptation [20.979759016826378]
Few-shot segmentation (FSS) aims to segment novel classes in a query image by using only a small number of supporting images from base classes. In cross-domain FSS, leveraging features from label-rich domains for resource-constrained domains poses challenges due to domain discrepancies. This work presents a Dynamically Adaptive Refine (DARNet) method that aims to balance generalization and specificity for CD-FSS.
arXiv Detail & Related papers (2023-12-08T03:03:22Z)
Adaptive Semantic Consistency for Cross-domain Few-shot Classification [27.176106714652327]
Cross-domain few-shot classification (CD-FSC) aims to identify novel target classes with a few samples. We propose a simple plug-and-play Adaptive Semantic Consistency framework, which improves cross-domain robustness. The proposed ASC enables explicit transfer of source domain knowledge to prevent the model from overfitting the target domain.
arXiv Detail & Related papers (2023-08-01T15:37:19Z)
I2F: A Unified Image-to-Feature Approach for Domain Adaptive Semantic Segmentation [55.633859439375044]
Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work. Key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly. This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation.
arXiv Detail & Related papers (2023-01-03T15:19:48Z)
Seeking Similarities over Differences: Similarity-based Domain Alignment for Adaptive Object Detection [86.98573522894961]
We propose a framework that generalizes the components commonly used by Unsupervised Domain Adaptation (UDA) algorithms for detection. Specifically, we propose a novel UDA algorithm, ViSGA, that leverages the best design choices and introduces a simple but effective method to aggregate features at instance-level. We show that both similarity-based grouping and adversarial training allows our model to focus on coarsely aligning feature groups, without being forced to match all instances across loosely aligned domains.
arXiv Detail & Related papers (2021-10-04T13:09:56Z)
Boosting Few-shot Semantic Segmentation with Transformers [81.43459055197435]
TRansformer-based Few-shot Semantic segmentation method (TRFS) Our model consists of two modules: Global Enhancement Module (GEM) and Local Enhancement Module (LEM)
arXiv Detail & Related papers (2021-08-04T20:09:21Z)
DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation [97.74059510314554]
Unsupervised domain adaptation (UDA) for semantic segmentation aims to adapt a segmentation model trained on the labeled source domain to the unlabeled target domain. Existing methods try to learn domain invariant features while suffering from large domain gaps. We propose a novel Dual Soft-Paste (DSP) method in this paper.
arXiv Detail & Related papers (2021-07-20T16:22:40Z)
AlignSeg: Feature-Aligned Segmentation Networks [109.94809725745499]
We propose Feature-Aligned Networks (AlignSeg) to address misalignment issues during the feature aggregation process. Our network achieves new state-of-the-art mIoU scores of 82.6% and 45.95%, respectively.
arXiv Detail & Related papers (2020-02-24T10:00:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.