Bridge the Points: Graph-based Few-shot Segment Anything Semantically
- URL: http://arxiv.org/abs/2410.06964v2
- Date: Fri, 11 Oct 2024 15:09:44 GMT
- Title: Bridge the Points: Graph-based Few-shot Segment Anything Semantically
- Authors: Anqi Zhang, Guangyu Gao, Jianbo Jiao, Chi Harold Liu, Yunchao Wei,
- Abstract summary: Recent advancements in pre-training techniques have enhanced the capabilities of vision foundation models.
Recent studies extend the SAM to Few-shot Semantic segmentation (FSS)
We propose a simple yet effective approach based on graph analysis.
- Score: 79.1519244940518
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The recent advancements in large-scale pre-training techniques have significantly enhanced the capabilities of vision foundation models, notably the Segment Anything Model (SAM), which can generate precise masks based on point and box prompts. Recent studies extend SAM to Few-shot Semantic Segmentation (FSS), focusing on prompt generation for SAM-based automatic semantic segmentation. However, these methods struggle with selecting suitable prompts, require specific hyperparameter settings for different scenarios, and experience prolonged one-shot inference times due to the overuse of SAM, resulting in low efficiency and limited automation ability. To address these issues, we propose a simple yet effective approach based on graph analysis. In particular, a Positive-Negative Alignment module dynamically selects the point prompts for generating masks, especially uncovering the potential of the background context as the negative reference. Another subsequent Point-Mask Clustering module aligns the granularity of masks and selected points as a directed graph, based on mask coverage over points. These points are then aggregated by decomposing the weakly connected components of the directed graph in an efficient manner, constructing distinct natural clusters. Finally, the positive and overshooting gating, benefiting from graph-based granularity alignment, aggregate high-confident masks and filter out the false-positive masks for final prediction, reducing the usage of additional hyperparameters and redundant mask generation. Extensive experimental analysis across standard FSS, One-shot Part Segmentation, and Cross Domain FSS datasets validate the effectiveness and efficiency of the proposed approach, surpassing state-of-the-art generalist models with a mIoU of 58.7% on COCO-20i and 35.2% on LVIS-92i. The code is available in https://andyzaq.github.io/GF-SAM/.
Related papers
- AM-SAM: Automated Prompting and Mask Calibration for Segment Anything Model [28.343378406337077]
We propose an automated prompting and mask calibration method called AM-SAM.
Our approach automatically generates prompts for an input image, eliminating the need for human involvement with a good performance in early training epochs.
Our experimental results demonstrate that AM-SAM achieves significantly accurate segmentation, matching or exceeding the effectiveness of human-generated and default prompts.
arXiv Detail & Related papers (2024-10-13T03:47:20Z) - Adapting Segment Anything Model for Unseen Object Instance Segmentation [70.60171342436092]
Unseen Object Instance (UOIS) is crucial for autonomous robots operating in unstructured environments.
We propose UOIS-SAM, a data-efficient solution for the UOIS task.
UOIS-SAM integrates two key components: (i) a Heatmap-based Prompt Generator (HPG) to generate class-agnostic point prompts with precise foreground prediction, and (ii) a Hierarchical Discrimination Network (HDNet) that adapts SAM's mask decoder.
arXiv Detail & Related papers (2024-09-23T19:05:50Z) - MaskUno: Switch-Split Block For Enhancing Instance Segmentation [0.0]
We propose replacing mask prediction with a Switch-Split block that processes refined ROIs, classifies them, and assigns them to specialized mask predictors.
An increase in the mean Average Precision (mAP) of 2.03% was observed for the high-performing DetectoRS when trained on 80 classes.
arXiv Detail & Related papers (2024-07-31T10:12:14Z) - SafaRi:Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation [11.243400478302771]
Referring Expression Consistency (RES) aims to provide a segmentation mask of the target object in an image referred to by the text.
We propose a weakly-supervised bootstrapping architecture for RES with several new algorithmic innovations.
arXiv Detail & Related papers (2024-07-02T16:02:25Z) - From Generalization to Precision: Exploring SAM for Tool Segmentation in
Surgical Environments [7.01085327371458]
We argue that Segment Anything Model drastically over-segment images with high corruption levels, resulting in degraded performance.
We employ the ground-truth tool mask to analyze the results of SAM when the best single mask is selected as prediction.
We analyze the Endovis18 and Endovis17 instrument segmentation datasets using synthetic corruptions of various strengths and an In-House dataset featuring counterfactually created real-world corruptions.
arXiv Detail & Related papers (2024-02-28T01:33:49Z) - SODAR: Segmenting Objects by DynamicallyAggregating Neighboring Mask
Representations [90.8752454643737]
Recent state-of-the-art one-stage instance segmentation model SOLO divides the input image into a grid and directly predicts per grid cell object masks with fully-convolutional networks.
We observe SOLO generates similar masks for an object at nearby grid cells, and these neighboring predictions can complement each other as some may better segment certain object part.
Motivated by the observed gap, we develop a novel learning-based aggregation method that improves upon SOLO by leveraging the rich neighboring information.
arXiv Detail & Related papers (2022-02-15T13:53:03Z) - Spatiotemporal Graph Neural Network based Mask Reconstruction for Video
Object Segmentation [70.97625552643493]
This paper addresses the task of segmenting class-agnostic objects in semi-supervised setting.
We propose a novel graph neuralS network (TG-Net) which captures the local contexts by utilizing all proposals.
arXiv Detail & Related papers (2020-12-10T07:57:44Z) - Online Multi-Object Tracking and Segmentation with GMPHD Filter and
Mask-based Affinity Fusion [79.87371506464454]
We propose a fully online multi-object tracking and segmentation (MOTS) method that uses instance segmentation results as an input.
The proposed method is based on the Gaussian mixture probability hypothesis density (GMPHD) filter, a hierarchical data association (HDA), and a mask-based affinity fusion (MAF) model.
In the experiments on the two popular MOTS datasets, the key modules show some improvements.
arXiv Detail & Related papers (2020-08-31T21:06:22Z) - PointINS: Point-based Instance Segmentation [117.38579097923052]
Mask representation in instance segmentation with Point-of-Interest (PoI) features is challenging because learning a high-dimensional mask feature for each instance requires a heavy computing burden.
We propose an instance-aware convolution, which decomposes this mask representation learning task into two tractable modules.
Along with instance-aware convolution, we propose PointINS, a simple and practical instance segmentation approach.
arXiv Detail & Related papers (2020-03-13T08:24:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.