CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model
- URL: http://arxiv.org/abs/2402.03631v3
- Date: Tue, 16 Jul 2024 01:23:47 GMT
- Title: CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model
- Authors: Aoran Xiao, Weihao Xuan, Heli Qi, Yun Xing, Ruijie Ren, Xiaoqin Zhang, Ling Shao, Shijian Lu,
- Abstract summary: This paper presents CAT-SAM, a ConditionAl Tuning network that adapts SAM toward various unconventional target tasks.
CAT-SAM freezes the entire SAM and adapts its mask decoder and image encoder simultaneously with a small number of learnable parameters.
Cat-SAM variants achieve superior target segmentation performance consistently even under the very challenging one-shot adaptation setup.
- Score: 90.26396410706857
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The recent Segment Anything Model (SAM) has demonstrated remarkable zero-shot capability and flexible geometric prompting in general image segmentation. However, SAM often struggles when handling various unconventional images, such as aerial, medical, and non-RGB images. This paper presents CAT-SAM, a ConditionAl Tuning network that adapts SAM toward various unconventional target tasks with just few-shot target samples. CAT-SAM freezes the entire SAM and adapts its mask decoder and image encoder simultaneously with a small number of learnable parameters. The core design is a prompt bridge structure that enables decoder-conditioned joint tuning of the heavyweight image encoder and the lightweight mask decoder. The bridging maps the prompt token of the mask decoder to the image encoder, fostering synergic adaptation of the encoder and the decoder with mutual benefits. We develop two representative tuning strategies for the image encoder which leads to two CAT-SAM variants: one injecting learnable prompt tokens in the input space and the other inserting lightweight adapter networks. Extensive experiments over 11 unconventional tasks show that both CAT-SAM variants achieve superior target segmentation performance consistently even under the very challenging one-shot adaptation setup. Project page: https://xiaoaoran.github.io/projects/CAT-SAM
Related papers
- SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation [51.90445260276897]
We prove that the Segment Anything Model 2 (SAM2) can be a strong encoder for U-shaped segmentation models.
We propose a simple but effective framework, termed SAM2-UNet, for versatile image segmentation.
arXiv Detail & Related papers (2024-08-16T17:55:38Z) - MAS-SAM: Segment Any Marine Animal with Aggregated Features [55.91291540810978]
We propose a novel feature learning framework named MAS-SAM for marine animal segmentation.
Our method enables to extract richer marine information from global contextual cues to fine-grained local details.
arXiv Detail & Related papers (2024-04-24T07:38:14Z) - Fantastic Animals and Where to Find Them: Segment Any Marine Animal with Dual SAM [62.85895749882285]
Marine Animal (MAS) involves segmenting animals within marine environments.
We propose a novel feature learning framework, named Dual-SAM for high-performance MAS.
Our proposed method achieves state-of-the-art performances on five widely-used MAS datasets.
arXiv Detail & Related papers (2024-04-07T15:34:40Z) - SAMCT: Segment Any CT Allowing Labor-Free Task-Indicator Prompts [28.171383990186904]
We construct a large CT dataset consisting of 1.1M CT images and 5M masks from public datasets.
We propose a powerful foundation model SAMCT allowing labor-free prompts.
Based on SAM, SAMCT is further equipped with a CNN image encoder, a cross-branch interaction module, and a task-indicator prompt encoder.
arXiv Detail & Related papers (2024-03-20T02:39:15Z) - PA-SAM: Prompt Adapter SAM for High-Quality Image Segmentation [19.65118388712439]
We introduce a novel prompt-driven adapter into SAM, namely Prompt Adapter Segment Anything Model (PA-SAM)
By exclusively training the prompt adapter, PA-SAM extracts detailed information from images and optimize the mask decoder feature at both sparse and dense prompt levels.
Experimental results demonstrate that our PA-SAM outperforms other SAM-based methods in high-quality, zero-shot, and open-set segmentation.
arXiv Detail & Related papers (2024-01-23T19:20:22Z) - BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model [65.92173280096588]
We address the challenge of image resolution variation for the Segment Anything Model (SAM)
SAM, known for its zero-shot generalizability, exhibits a performance degradation when faced with datasets with varying image sizes.
We present a bias-mode attention mask that allows each token to prioritize neighboring information.
arXiv Detail & Related papers (2024-01-04T15:34:44Z) - EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment
Anything [36.553867358541154]
Segment Anything Model (SAM) has emerged as a powerful tool for numerous vision applications.
We propose EfficientSAMs, light-weight SAM models that exhibits decent performance with largely reduced complexity.
Our idea is based on leveraging masked image pretraining, SAMI, which learns to reconstruct features from SAM image encoder for effective visual representation learning.
arXiv Detail & Related papers (2023-12-01T18:31:00Z) - AutoSAM: Adapting SAM to Medical Images by Overloading the Prompt
Encoder [101.28268762305916]
In this work, we replace Segment Anything Model with an encoder that operates on the same input image.
We obtain state-of-the-art results on multiple medical images and video benchmarks.
For inspecting the knowledge within it, and providing a lightweight segmentation solution, we also learn to decode it into a mask by a shallow deconvolution network.
arXiv Detail & Related papers (2023-06-10T07:27:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.