Related papers: GoodSAM++: Bridging Domain and Capacity Gaps via Segment Anything Model for Panoramic Semantic Segmentation

GoodSAM++: Bridging Domain and Capacity Gaps via Segment Anything Model for Panoramic Semantic Segmentation

URL: http://arxiv.org/abs/2408.09115v1
Date: Sat, 17 Aug 2024 06:53:10 GMT
Title: GoodSAM++: Bridging Domain and Capacity Gaps via Segment Anything Model for Panoramic Semantic Segmentation
Authors: Weiming Zhang, Yexin Liu, Xu Zheng, Lin Wang,
Abstract summary: GoodSAM++ is a novel framework utilizing the powerful zero-shot instance segmentation capability of SAM (i.e., teacher) to learn a compact panoramic semantic segmentation model. GoodSAM++ addresses two critical challenges: 1) SAM's inability to provide semantic labels and inherent distortion problems of panoramic images; 2) the significant capacity disparity between SAM and the student.
Score: 22.344399402787644
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents GoodSAM++, a novel framework utilizing the powerful zero-shot instance segmentation capability of SAM (i.e., teacher) to learn a compact panoramic semantic segmentation model, i.e., student, without requiring any labeled data. GoodSAM++ addresses two critical challenges: 1) SAM's inability to provide semantic labels and inherent distortion problems of panoramic images; 2) the significant capacity disparity between SAM and the student. The `out-of-the-box' insight of GoodSAM++ is to introduce a teacher assistant (TA) to provide semantic information for SAM, integrated with SAM to obtain reliable pseudo semantic maps to bridge both domain and capacity gaps. To make this possible, we first propose a Distortion-Aware Rectification (DARv2) module to address the domain gap. It effectively mitigates the object deformation and distortion problem in panoramic images to obtain pseudo semantic maps. We then introduce a Multi-level Knowledge Adaptation (MKA) module to efficiently transfer the semantic information from the TA and pseudo semantic maps to our compact student model, addressing the significant capacity gap. We conduct extensive experiments on both outdoor and indoor benchmark datasets, showing that our GoodSAM++ achieves a remarkable performance improvement over the state-of-the-art (SOTA) domain adaptation methods. Moreover, diverse open-world scenarios demonstrate the generalization capacity of our GoodSAM++. Last but not least, our most lightweight student model achieves comparable performance to the SOTA models with only 3.7 million parameters.

Related papers

InfoSAM: Fine-Tuning the Segment Anything Model from An Information-Theoretic Perspective [9.466559751950639]
The Segment Anything Model (SAM) exhibits impressive zero-shot capabilities in general tasks but struggles in specialized domains.<n>We propose InfoSAM, an information-theoretic approach that enhances SAM fine-tuning by distilling and preserving its pre-trained segmentation knowledge.<n>Experiments across diverse benchmarks validate InfoSAM's effectiveness in improving SAM family's performance on real-world tasks.
arXiv Detail & Related papers (2025-05-28T03:09:22Z)
There is no SAMantics! Exploring SAM as a Backbone for Visual Understanding Tasks [15.061599989448867]
The Segment Anything Model (SAM) was originally designed for label-agnostic mask generation. We quantify SAM's semantic capabilities by comparing base image encoder efficacy under classification tasks. Our findings reveal a significant lack of semantic discriminability in SAM feature representations.
arXiv Detail & Related papers (2024-11-22T17:00:18Z)
AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning [61.666973416903005]
Segment Anything Model (SAM) has demonstrated its impressive generalization capabilities in open-world scenarios with the guidance of prompts. We propose a novel framework, termed AlignSAM, designed for automatic prompting for aligning SAM to an open context.
arXiv Detail & Related papers (2024-06-01T16:21:39Z)
ASAM: Boosting Segment Anything Model with Adversarial Tuning [9.566046692165884]
This paper introduces ASAM, a novel methodology that amplifies a foundation model's performance through adversarial tuning. We harness the potential of natural adversarial examples, inspired by their successful implementation in natural language processing. Our approach maintains the photorealism of adversarial examples and ensures alignment with original mask annotations.
arXiv Detail & Related papers (2024-05-01T00:13:05Z)
GoodSAM: Bridging Domain and Capacity Gaps via Segment Anything Model for Distortion-aware Panoramic Semantic Segmentation [22.344399402787644]
This paper tackles a novel yet challenging problem: how to transfer knowledge from the emerging Segment Anything Model (SAM) We propose a framework, called GoodSAM, that introduces a teacher assistant (TA) to provide semantic information, integrated with SAM to generate ensemble logits. Experiments on two benchmarks show that our GoodSAM achieves a remarkable +3.75% mIoU improvement over the state-of-the-art (SOTA) domain adaptation methods.
arXiv Detail & Related papers (2024-03-25T02:30:32Z)
PosSAM: Panoptic Open-vocabulary Segment Anything [58.72494640363136]
PosSAM is an open-vocabulary panoptic segmentation model that unifies the strengths of the Segment Anything Model (SAM) with the vision-native CLIP model in an end-to-end framework. We introduce a Mask-Aware Selective Ensembling (MASE) algorithm that adaptively enhances the quality of generated masks and boosts the performance of open-vocabulary classification during inference for each image.
arXiv Detail & Related papers (2024-03-14T17:55:03Z)
WSI-SAM: Multi-resolution Segment Anything Model (SAM) for histopathology whole-slide images [8.179859593451285]
We present WSI-SAM, enhancing Segment Anything Model (SAM) with precise object segmentation capabilities for histopathology images. To fully exploit pretrained knowledge while minimizing training overhead, we keep SAM frozen, introducing only minimal extra parameters. Our model outperforms SAM by 4.1 and 2.5 percent points on a ductal carcinoma in situ (DCIS) segmentation tasks and breast cancer metastasis segmentation task.
arXiv Detail & Related papers (2024-03-14T10:30:43Z)
Boosting Segment Anything Model Towards Open-Vocabulary Learning [69.42565443181017]
Segment Anything Model (SAM) has emerged as a new paradigmatic vision foundation model. Despite SAM finding applications and adaptations in various domains, its primary limitation lies in the inability to grasp object semantics. We present Sambor to seamlessly integrate SAM with the open-vocabulary object detector in an end-to-end framework.
arXiv Detail & Related papers (2023-12-06T17:19:00Z)
Stable Segment Anything Model [79.9005670886038]
The Segment Anything Model (SAM) achieves remarkable promptable segmentation given high-quality prompts. This paper presents the first comprehensive analysis on SAM's segmentation stability across a diverse spectrum of prompt qualities. Our solution, termed Stable-SAM, offers several advantages: 1) improved SAM's segmentation stability across a wide range of prompt qualities, while 2) retaining SAM's powerful promptable segmentation efficiency and generality.
arXiv Detail & Related papers (2023-11-27T12:51:42Z)
Semantic-SAM: Segment and Recognize Anything at Any Granularity [83.64686655044765]
We introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity. We consolidate multiple datasets across three granularities and introduce decoupled classification for objects and parts. For the multi-granularity capability, we propose a multi-choice learning scheme during training, enabling each click to generate masks at multiple levels.
arXiv Detail & Related papers (2023-07-10T17:59:40Z)
RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation. Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal. We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.