Adapting Segment Anything Model for Unseen Object Instance Segmentation
- URL: http://arxiv.org/abs/2409.15481v1
- Date: Mon, 23 Sep 2024 19:05:50 GMT
- Title: Adapting Segment Anything Model for Unseen Object Instance Segmentation
- Authors: Rui Cao, Chuanxin Song, Biqi Yang, Jiangliu Wang, Pheng-Ann Heng, Yun-Hui Liu,
- Abstract summary: Unseen Object Instance (UOIS) is crucial for autonomous robots operating in unstructured environments.
We propose UOIS-SAM, a data-efficient solution for the UOIS task.
UOIS-SAM integrates two key components: (i) a Heatmap-based Prompt Generator (HPG) to generate class-agnostic point prompts with precise foreground prediction, and (ii) a Hierarchical Discrimination Network (HDNet) that adapts SAM's mask decoder.
- Score: 70.60171342436092
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unseen Object Instance Segmentation (UOIS) is crucial for autonomous robots operating in unstructured environments. Previous approaches require full supervision on large-scale tabletop datasets for effective pretraining. In this paper, we propose UOIS-SAM, a data-efficient solution for the UOIS task that leverages SAM's high accuracy and strong generalization capabilities. UOIS-SAM integrates two key components: (i) a Heatmap-based Prompt Generator (HPG) to generate class-agnostic point prompts with precise foreground prediction, and (ii) a Hierarchical Discrimination Network (HDNet) that adapts SAM's mask decoder, mitigating issues introduced by the SAM baseline, such as background confusion and over-segmentation, especially in scenarios involving occlusion and texture-rich objects. Extensive experimental results on OCID, OSD, and additional photometrically challenging datasets including PhoCAL and HouseCat6D, demonstrate that, even using only 10% of the training samples compared to previous methods, UOIS-SAM achieves state-of-the-art performance in unseen object segmentation, highlighting its effectiveness and robustness in various tabletop scenes.
Related papers
- Bridge the Points: Graph-based Few-shot Segment Anything Semantically [79.1519244940518]
Recent advancements in pre-training techniques have enhanced the capabilities of vision foundation models.
Recent studies extend the SAM to Few-shot Semantic segmentation (FSS)
We propose a simple yet effective approach based on graph analysis.
arXiv Detail & Related papers (2024-10-09T15:02:28Z) - SOS: Segment Object System for Open-World Instance Segmentation With Object Priors [2.856781525749652]
We propose an approach to segment arbitrary unknown objects in images by generalizing from a limited set of annotated object classes during training.
Our approach shows strong generalization capabilities on COCO, LVIS, and ADE20k datasets and improves on the precision by up to 81.6% compared to the state-of-the-art.
arXiv Detail & Related papers (2024-09-22T23:35:31Z) - Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection [58.241593208031816]
Segment Anything Model (SAM) has been proposed as a visual fundamental model, which gives strong segmentation and generalization capabilities.
We propose a Multi-scale and Detail-enhanced SAM (MDSAM) for Salient Object Detection (SOD)
Experimental results demonstrate the superior performance of our model on multiple SOD datasets.
arXiv Detail & Related papers (2024-08-08T09:09:37Z) - PosSAM: Panoptic Open-vocabulary Segment Anything [58.72494640363136]
PosSAM is an open-vocabulary panoptic segmentation model that unifies the strengths of the Segment Anything Model (SAM) with the vision-native CLIP model in an end-to-end framework.
We introduce a Mask-Aware Selective Ensembling (MASE) algorithm that adaptively enhances the quality of generated masks and boosts the performance of open-vocabulary classification during inference for each image.
arXiv Detail & Related papers (2024-03-14T17:55:03Z) - WSI-SAM: Multi-resolution Segment Anything Model (SAM) for histopathology whole-slide images [8.179859593451285]
We present WSI-SAM, enhancing Segment Anything Model (SAM) with precise object segmentation capabilities for histopathology images.
To fully exploit pretrained knowledge while minimizing training overhead, we keep SAM frozen, introducing only minimal extra parameters.
Our model outperforms SAM by 4.1 and 2.5 percent points on a ductal carcinoma in situ (DCIS) segmentation tasks and breast cancer metastasis segmentation task.
arXiv Detail & Related papers (2024-03-14T10:30:43Z) - From Generalization to Precision: Exploring SAM for Tool Segmentation in
Surgical Environments [7.01085327371458]
We argue that Segment Anything Model drastically over-segment images with high corruption levels, resulting in degraded performance.
We employ the ground-truth tool mask to analyze the results of SAM when the best single mask is selected as prediction.
We analyze the Endovis18 and Endovis17 instrument segmentation datasets using synthetic corruptions of various strengths and an In-House dataset featuring counterfactually created real-world corruptions.
arXiv Detail & Related papers (2024-02-28T01:33:49Z) - BLO-SAM: Bi-level Optimization Based Overfitting-Preventing Finetuning
of SAM [37.1263294647351]
We introduce BLO-SAM, which finetunes the Segment Anything Model (SAM) based on bi-level optimization (BLO)
BLO-SAM reduces the risk of overfitting by training the model's weight parameters and the prompt embedding on two separate subsets of the training dataset.
Results demonstrate BLO-SAM's superior performance over various state-of-the-art image semantic segmentation methods.
arXiv Detail & Related papers (2024-02-26T06:36:32Z) - ClassWise-SAM-Adapter: Parameter Efficient Fine-tuning Adapts Segment
Anything to SAR Domain for Semantic Segmentation [6.229326337093342]
Segment Anything Model (SAM) excels in various segmentation scenarios relying on semantic information and generalization ability.
The ClassWiseSAM-Adapter (CWSAM) is designed to adapt the high-performing SAM for landcover classification on space-borne Synthetic Aperture Radar (SAR) images.
CWSAM showcases enhanced performance with fewer computing resources.
arXiv Detail & Related papers (2024-01-04T15:54:45Z) - SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment
Anything Model [85.85899655118087]
We develop an efficient pipeline for generating a large-scale RS segmentation dataset, dubbed SAMRS.
SAMRS totally possesses 105,090 images and 1,668,241 instances, surpassing existing high-resolution RS segmentation datasets in size by several orders of magnitude.
arXiv Detail & Related papers (2023-05-03T10:58:07Z) - Semantic Attention and Scale Complementary Network for Instance
Segmentation in Remote Sensing Images [54.08240004593062]
We propose an end-to-end multi-category instance segmentation model, which consists of a Semantic Attention (SEA) module and a Scale Complementary Mask Branch (SCMB)
SEA module contains a simple fully convolutional semantic segmentation branch with extra supervision to strengthen the activation of interest instances on the feature map.
SCMB extends the original single mask branch to trident mask branches and introduces complementary mask supervision at different scales.
arXiv Detail & Related papers (2021-07-25T08:53:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.