Unsupervised Instance Segmentation with Superpixels
- URL: http://arxiv.org/abs/2509.05352v1
- Date: Wed, 03 Sep 2025 02:46:48 GMT
- Title: Unsupervised Instance Segmentation with Superpixels
- Authors: Cuong Manh Hoang,
- Abstract summary: We present a new framework that efficiently segments objects without the need for human annotations.<n>We compute a novel superpixel-guided mask loss, with high-quality coarse masks and superpixels segmented from low-level image features.<n>We conduct experiments on public datasets in instance segmentation and object detection to demonstrate the effectiveness of the proposed framework.
- Score: 2.538209532048867
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Instance segmentation is essential for numerous computer vision applications, including robotics, human-computer interaction, and autonomous driving. Currently, popular models bring impressive performance in instance segmentation by training with a large number of human annotations, which are costly to collect. For this reason, we present a new framework that efficiently and effectively segments objects without the need for human annotations. Firstly, a MultiCut algorithm is applied to self-supervised features for coarse mask segmentation. Then, a mask filter is employed to obtain high-quality coarse masks. To train the segmentation network, we compute a novel superpixel-guided mask loss, comprising hard loss and soft loss, with high-quality coarse masks and superpixels segmented from low-level image features. Lastly, a self-training process with a new adaptive loss is proposed to improve the quality of predicted masks. We conduct experiments on public datasets in instance segmentation and object detection to demonstrate the effectiveness of the proposed framework. The results show that the proposed framework outperforms previous state-of-the-art methods.
Related papers
- MaskAttn-UNet: A Mask Attention-Driven Framework for Universal Low-Resolution Image Segmentation [5.130440339897479]
MaskAttn-UNet is a novel segmentation framework that enhances the traditional U-Net architecture via a mask attention mechanism.<n>Our model selectively emphasizes important regions while suppressing irrelevant backgrounds, thereby improving segmentation accuracy in cluttered and complex scenes.<n>Our results show that MaskAttn-UNet achieves accuracy comparable to state-of-the-art methods at significantly lower computational cost than transformer-based models.
arXiv Detail & Related papers (2025-03-11T22:43:26Z) - ProMerge: Prompt and Merge for Unsupervised Instance Segmentation [4.297070083645049]
Unsupervised instance segmentation aims to segment distinct object instances in an image without relying on human-labeled data.
Recent state-of-the-art approaches use self-supervised features to represent images as graphs and solve a generalized eigenvalue system to generate foreground masks.
We propose Prompt and Merge (ProMerge), which leverages self-supervised visual features to obtain initial groupings of patches and applies a strategic merging to these segments.
arXiv Detail & Related papers (2024-09-27T17:59:42Z) - LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion [79.22197702626542]
This paper introduces a framework that explores amodal segmentation for robotic grasping in cluttered scenes.
We propose a Linear-fusion Attention-guided Convolutional Network (LAC-Net)
The results on different datasets show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-08-06T14:50:48Z) - Variance-insensitive and Target-preserving Mask Refinement for
Interactive Image Segmentation [68.16510297109872]
Point-based interactive image segmentation can ease the burden of mask annotation in applications such as semantic segmentation and image editing.
We introduce a novel method, Variance-Insensitive and Target-Preserving Mask Refinement to enhance segmentation quality with fewer user inputs.
Experiments on GrabCut, Berkeley, SBD, and DAVIS datasets demonstrate our method's state-of-the-art performance in interactive image segmentation.
arXiv Detail & Related papers (2023-12-22T02:31:31Z) - HAISTA-NET: Human Assisted Instance Segmentation Through Attention [3.073046540587735]
We propose a novel approach to enable more precise predictions and generate higher-quality segmentation masks.
Our human-assisted segmentation model, HAISTA-NET, augments the existing Strong Mask R-CNN network to incorporate human-specified partial boundaries.
We show that HAISTA-NET outperforms state-of-the art methods such as Mask R-CNN, Strong Mask R-CNN, and Mask2Former.
arXiv Detail & Related papers (2023-05-04T18:39:14Z) - Exploiting Shape Cues for Weakly Supervised Semantic Segmentation [15.791415215216029]
Weakly supervised semantic segmentation (WSSS) aims to produce pixel-wise class predictions with only image-level labels for training.
We propose to exploit shape information to supplement the texture-biased property of convolutional neural networks (CNNs)
We further refine the predictions in an online fashion with a novel refinement method that takes into account both the class and the color affinities.
arXiv Detail & Related papers (2022-08-08T17:25:31Z) - Discovering Object Masks with Transformers for Unsupervised Semantic
Segmentation [75.00151934315967]
MaskDistill is a novel framework for unsupervised semantic segmentation.
Our framework does not latch onto low-level image cues and is not limited to object-centric datasets.
arXiv Detail & Related papers (2022-06-13T17:59:43Z) - Per-Pixel Classification is Not All You Need for Semantic Segmentation [184.2905747595058]
Mask classification is sufficiently general to solve both semantic- and instance-level segmentation tasks.
We propose MaskFormer, a simple mask classification model which predicts a set of binary masks.
Our method outperforms both current state-of-the-art semantic (55.6 mIoU on ADE20K) and panoptic segmentation (52.7 PQ on COCO) models.
arXiv Detail & Related papers (2021-07-13T17:59:50Z) - LevelSet R-CNN: A Deep Variational Method for Instance Segmentation [79.20048372891935]
Currently, many state of the art models are based on the Mask R-CNN framework.
We propose LevelSet R-CNN, which combines the best of both worlds by obtaining powerful feature representations.
We demonstrate the effectiveness of our approach on COCO and Cityscapes datasets.
arXiv Detail & Related papers (2020-07-30T17:52:18Z) - Learning Fast and Robust Target Models for Video Object Segmentation [83.3382606349118]
Video object segmentation (VOS) is a highly challenging problem since the initial mask, defining the target object, is only given at test-time.
Most previous approaches fine-tune segmentation networks on the first frame, resulting in impractical frame-rates and risk of overfitting.
We propose a novel VOS architecture consisting of two network components.
arXiv Detail & Related papers (2020-02-27T21:58:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.