Multi-scale Contrastive Adaptor Learning for Segmenting Anything in Underperformed Scenes
- URL: http://arxiv.org/abs/2408.05936v1
- Date: Mon, 12 Aug 2024 06:23:10 GMT
- Title: Multi-scale Contrastive Adaptor Learning for Segmenting Anything in Underperformed Scenes
- Authors: Ke Zhou, Zhongwei Qiu, Dongmei Fu,
- Abstract summary: We introduce a novel Multi-scale Contrastive Adaptor learning method named MCA-SAM.
MCA-SAM enhances adaptor performance through a meticulously designed contrastive learning framework at both token and sample levels.
Empirical results demonstrate that MCA-SAM sets new benchmarks, outperforming existing methods in three challenging domains.
- Score: 12.36950265154199
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Foundational vision models, such as the Segment Anything Model (SAM), have achieved significant breakthroughs through extensive pre-training on large-scale visual datasets. Despite their general success, these models may fall short in specialized tasks with limited data, and fine-tuning such large-scale models is often not feasible. Current strategies involve incorporating adaptors into the pre-trained SAM to facilitate downstream task performance with minimal model adjustment. However, these strategies can be hampered by suboptimal learning approaches for the adaptors. In this paper, we introduce a novel Multi-scale Contrastive Adaptor learning method named MCA-SAM, which enhances adaptor performance through a meticulously designed contrastive learning framework at both token and sample levels. Our Token-level Contrastive adaptor (TC-adaptor) focuses on refining local representations by improving the discriminability of patch tokens, while the Sample-level Contrastive adaptor (SC-adaptor) amplifies global understanding across different samples. Together, these adaptors synergistically enhance feature comparison within and across samples, bolstering the model's representational strength and its ability to adapt to new tasks. Empirical results demonstrate that MCA-SAM sets new benchmarks, outperforming existing methods in three challenging domains: camouflage object detection, shadow segmentation, and polyp segmentation. Specifically, MCA-SAM exhibits substantial relative performance enhancements, achieving a 20.0% improvement in MAE on the COD10K dataset, a 6.0% improvement in MAE on the CAMO dataset, a 15.4% improvement in BER on the ISTD dataset, and a 7.9% improvement in mDice on the Kvasir-SEG dataset.
Related papers
- SAM-DA: Decoder Adapter for Efficient Medical Domain Adaptation [3.5534229601986294]
This paper addresses the domain adaptation challenge for semantic segmentation in medical imaging.
Recent approaches that perform end-to-end fine-tuning of models are simply not computationally tractable.
We propose a novel SAM adapter approach that minimizes the number of trainable parameters while achieving comparable performances to full fine-tuning.
arXiv Detail & Related papers (2025-01-12T15:08:29Z) - Continual Learning for Segment Anything Model Adaptation [14.00191851894315]
We propose a novel Continual SAM adaptation (CoSAM) benchmark with 8 different task domains.
We then propose a novel simple-yet-effective Mixture of Domain Adapters (MoDA) algorithm to help the SAM encoder extract well-separated features for different task domains.
Our MoDA maintains highly competitive results in the natural image domain, approaching the zero-shot performance of the original SAM.
arXiv Detail & Related papers (2024-12-09T11:51:28Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning [41.59855801010565]
Large multimodal models (LMMs) potentially act as general-purpose assistants and are highly robust against different distributions.
Despite this, domain-specific adaptation is still necessary particularly in specialized areas like healthcare.
This work investigates in-context learning (ICL) as an effective alternative for enhancing LMMs' adaptability.
arXiv Detail & Related papers (2024-05-20T17:59:21Z) - GoodSAM: Bridging Domain and Capacity Gaps via Segment Anything Model for Distortion-aware Panoramic Semantic Segmentation [22.344399402787644]
This paper tackles a novel yet challenging problem: how to transfer knowledge from the emerging Segment Anything Model (SAM)
We propose a framework, called GoodSAM, that introduces a teacher assistant (TA) to provide semantic information, integrated with SAM to generate ensemble logits.
Experiments on two benchmarks show that our GoodSAM achieves a remarkable +3.75% mIoU improvement over the state-of-the-art (SOTA) domain adaptation methods.
arXiv Detail & Related papers (2024-03-25T02:30:32Z) - SAMDA: Leveraging SAM on Few-Shot Domain Adaptation for Electronic
Microscopy Segmentation [3.7562258027956186]
We present a new few-shot domain adaptation framework SAMDA.
It combines the Segment Anything Model(SAM) with nnUNet in the embedding space to achieve high transferability and accuracy.
arXiv Detail & Related papers (2024-03-12T02:28:29Z) - CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances.
We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data.
Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z) - Instance-specific and Model-adaptive Supervision for Semi-supervised
Semantic Segmentation [49.82432158155329]
We propose an instance-specific and model-adaptive supervision for semi-supervised semantic segmentation, named iMAS.
iMAS learns from unlabeled instances progressively by weighing their corresponding consistency losses based on the evaluated hardness.
arXiv Detail & Related papers (2022-11-21T10:37:28Z) - Beyond Transfer Learning: Co-finetuning for Action Localisation [64.07196901012153]
We propose co-finetuning -- simultaneously training a single model on multiple upstream'' and downstream'' tasks.
We demonstrate that co-finetuning outperforms traditional transfer learning when using the same total amount of data.
We also show how we can easily extend our approach to multiple upstream'' datasets to further improve performance.
arXiv Detail & Related papers (2022-07-08T10:25:47Z) - CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE)
At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales.
We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z) - MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [100.17755160696939]
We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains.
We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images.
A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
arXiv Detail & Related papers (2021-12-27T16:16:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.