Learning Accurate Segmentation Purely from Self-Supervision
- URL: http://arxiv.org/abs/2602.23759v1
- Date: Fri, 27 Feb 2026 07:36:32 GMT
- Title: Learning Accurate Segmentation Purely from Self-Supervision
- Authors: Zuyao You, Zuxuan Wu, Yu-Gang Jiang,
- Abstract summary: Selfment is a fully self-supervised framework that segments objects directly from raw images without human labels.<n>Selfment sets new state-of-the-art (SoTA) results across multiple benchmarks.
- Score: 87.78965637247107
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Accurately segmenting objects without any manual annotations remains one of the core challenges in computer vision. In this work, we introduce Selfment, a fully self-supervised framework that segments foreground objects directly from raw images without human labels, pretrained segmentation models, or any post-processing. Selfment first constructs patch-level affinity graphs from self-supervised features and applies NCut to obtain an initial coarse foreground--background separation. We then introduce Iterative Patch Optimization (IPO), a feature-space refinement procedure that progressively enforces spatial coherence and semantic consistency through iterative patch clustering. The refined masks are subsequently used as supervisory signals to train a lightweight segmentation head with contrastive and region-consistency objectives, allowing the model to learn stable and transferable object representations. Despite its simplicity and complete absence of manual supervision, Selfment sets new state-of-the-art (SoTA) results across multiple benchmarks. It achieves substantial improvements on $F_{\max}$ over previous unsupervised saliency detection methods on ECSSD ($+4.0\%$), HKUIS ($+4.6\%$), and PASCAL-S ($+5.7\%$). Moreover, without any additional fine-tuning, Selfment demonstrates remarkable zero-shot generalization to camouflaged object detection tasks (e.g., $0.910$ $S_m$ on CHAMELEON and $0.792$ $F_β^ω$ on CAMO), outperforming all existing unsupervised approaches and even rivaling the SoTA fully supervised methods.
Related papers
- PANC: Prior-Aware Normalized Cut for Object Segmentation [0.0]
We propose a weakly supervised spectral segmentation framework that uses a minimal set of annotated visual tokens.<n>We report strong results on homogeneous, fine-grained, and texture-limited domains.<n>For multi-object benchmarks, the framework showcases explicit, user-controllable semantic segmentation.
arXiv Detail & Related papers (2026-02-06T18:07:20Z) - ZS-VCOS: Zero-Shot Video Camouflaged Object Segmentation By Optical Flow and Open Vocabulary Object Detection [7.457821910654639]
This work studies how to avoid training by integrating large pre-trained models like SAM-2 and Owl-v2 with temporal information into a modular pipeline.<n>Our approach also surpasses supervised methods, increasing the F-measure from 0.476 to 0.628.
arXiv Detail & Related papers (2025-04-10T06:24:54Z) - DI-MaskDINO: A Joint Object Detection and Instance Segmentation Model [67.56918651825056]
The performance of object detection lags behind that of instance segmentation (i.e., performance imbalance) when investigating the intermediate results from the beginning transformer decoder layer of MaskDINO.
This paper proposes DI-MaskDINO model, the core idea of which is to improve the final performance by alleviating the detection-segmentation imbalance.
DI-MaskDINO outperforms existing joint object detection and instance segmentation models on COCO and BDD100K benchmarks.
arXiv Detail & Related papers (2024-10-22T05:22:49Z) - Bridge the Points: Graph-based Few-shot Segment Anything Semantically [79.1519244940518]
Recent advancements in pre-training techniques have enhanced the capabilities of vision foundation models.
Recent studies extend the SAM to Few-shot Semantic segmentation (FSS)
We propose a simple yet effective approach based on graph analysis.
arXiv Detail & Related papers (2024-10-09T15:02:28Z) - Unsupervised Universal Image Segmentation [59.0383635597103]
We propose an Unsupervised Universal model (U2Seg) adept at performing various image segmentation tasks.
U2Seg generates pseudo semantic labels for these segmentation tasks via leveraging self-supervised models.
We then self-train the model on these pseudo semantic labels, yielding substantial performance gains.
arXiv Detail & Related papers (2023-12-28T18:59:04Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Unsupervised Semantic Segmentation by Distilling Feature Correspondences [94.73675308961944]
Unsupervised semantic segmentation aims to discover and localize semantically meaningful categories within image corpora without any form of annotation.
We present STEGO, a novel framework that distills unsupervised features into high-quality discrete semantic labels.
STEGO yields a significant improvement over the prior state of the art, on both the CocoStuff and Cityscapes challenges.
arXiv Detail & Related papers (2022-03-16T06:08:47Z) - FreeSOLO: Learning to Segment Objects without Annotations [191.82134817449528]
We present FreeSOLO, a self-supervised instance segmentation framework built on top of the simple instance segmentation method SOLO.
Our method also presents a novel localization-aware pre-training framework, where objects can be discovered from complicated scenes in an unsupervised manner.
arXiv Detail & Related papers (2022-02-24T16:31:44Z) - Towards Single Stage Weakly Supervised Semantic Segmentation [2.28438857884398]
We present a single-stage approach to weakly supervised semantic segmentation.
We use point annotations to generate reliable, on-the-fly pseudo-masks.
We significantly outperform other SOTA WSSS methods on recent real-world datasets.
arXiv Detail & Related papers (2021-06-18T18:34:50Z) - Structure-Consistent Weakly Supervised Salient Object Detection with
Local Saliency Coherence [14.79639149658596]
We propose a one-round end-to-end training approach for weakly supervised salient object detection via scribble annotations.
Our method achieves a new state-of-the-art performance on six benchmarks.
arXiv Detail & Related papers (2020-12-08T12:49:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.