Related papers: EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation

EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation

URL: http://arxiv.org/abs/2403.01482v4
Date: Fri, 5 Apr 2024 16:11:19 GMT
Title: EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation
Authors: Chanyoung Kim, Woojung Han, Dayun Ju, Seong Jae Hwang,
Abstract summary: We introduce EiCue, a technique providing semantic and structural cues through an eigenbasis derived from semantic similarity matrix. We guide our model to learn object-level representations with intra- and inter-image object-feature consistency. Experiments on COCO-Stuff, Cityscapes, and Potsdam-3 datasets demonstrate the state-of-the-art USS results.
Score: 5.476136494434766
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Semantic segmentation has innately relied on extensive pixel-level annotated data, leading to the emergence of unsupervised methodologies. Among them, leveraging self-supervised Vision Transformers for unsupervised semantic segmentation (USS) has been making steady progress with expressive deep features. Yet, for semantically segmenting images with complex objects, a predominant challenge remains: the lack of explicit object-level semantic encoding in patch-level features. This technical limitation often leads to inadequate segmentation of complex objects with diverse structures. To address this gap, we present a novel approach, EAGLE, which emphasizes object-centric representation learning for unsupervised semantic segmentation. Specifically, we introduce EiCue, a spectral technique providing semantic and structural cues through an eigenbasis derived from the semantic similarity matrix of deep image features and color affinity from an image. Further, by incorporating our object-centric contrastive loss with EiCue, we guide our model to learn object-level representations with intra- and inter-image object-feature consistency, thereby enhancing semantic accuracy. Extensive experiments on COCO-Stuff, Cityscapes, and Potsdam-3 datasets demonstrate the state-of-the-art USS results of EAGLE with accurate and consistent semantic segmentation across complex scenes.

Related papers

Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation [47.047267066525265]
We introduce a novel approach that incorporates object-level contextual knowledge within images. Our proposed approach achieves state-of-the-art performance with strong generalizability across diverse datasets.
arXiv Detail & Related papers (2024-11-26T06:34:48Z)
Subobject-level Image Tokenization [60.80949852899857]
Patch-based image tokenization ignores the morphology of the visual world. Inspired by subword tokenization, we introduce subobject-level adaptive token segmentation. We show that subobject tokenization enables faster convergence and better generalization while using fewer visual tokens.
arXiv Detail & Related papers (2024-02-22T06:47:44Z)
Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner. We design a semantic-guided self-supervised learning model to extract high-level semantic features from images. We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z)
De-coupling and De-positioning Dense Self-supervised Learning [65.56679416475943]
Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects. We show that they suffer from coupling and positional bias, which arise from the receptive field increasing with layer depth and zero-padding. We demonstrate the benefits of our method on COCO and on a new challenging benchmark, OpenImage-MINI, for object classification, semantic segmentation, and object detection.
arXiv Detail & Related papers (2023-03-29T18:07:25Z)
Framework-agnostic Semantically-aware Global Reasoning for Segmentation [29.69187816377079]
We propose a component that learns to project image features into latent representations and reason between them. Our design encourages the latent regions to represent semantic concepts by ensuring that the activated regions are spatially disjoint. Our latent tokens are semantically interpretable and diverse and provide a rich set of features that can be transferred to downstream tasks.
arXiv Detail & Related papers (2022-12-06T21:42:05Z)
Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization [98.46318529630109]
We take inspiration from traditional spectral segmentation methods by reframing image decomposition as a graph partitioning problem. We find that these eigenvectors already decompose an image into meaningful segments, and can be readily used to localize objects in a scene. By clustering the features associated with these segments across a dataset, we can obtain well-delineated, nameable regions.
arXiv Detail & Related papers (2022-05-16T17:47:44Z)
Self-Supervised Learning of Object Parts for Semantic Segmentation [7.99536002595393]
We argue that self-supervised learning of object parts is a solution to this issue. Our method surpasses the state-of-the-art on three semantic segmentation benchmarks by 17%-3%.
arXiv Detail & Related papers (2022-04-27T17:55:17Z)
TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation [44.75300205362518]
Unsupervised semantic segmentation aims to obtain high-level semantic representation on low-level visual features without manual annotations. We propose the first top-down unsupervised semantic segmentation framework for fine-grained segmentation in extremely complicated scenarios. Our results show that our top-down unsupervised segmentation is robust to both object-centric and scene-centric datasets.
arXiv Detail & Related papers (2021-12-02T18:59:03Z)
Three Ways to Improve Semantic Segmentation with Self-Supervised Depth Estimation [90.87105131054419]
We present a framework for semi-supervised semantic segmentation, which is enhanced by self-supervised monocular depth estimation from unlabeled image sequences. We validate the proposed model on the Cityscapes dataset, where all three modules demonstrate significant performance gains.
arXiv Detail & Related papers (2020-12-19T21:18:03Z)
Improving Semantic Segmentation via Decoupled Body and Edge Supervision [89.57847958016981]
Existing semantic segmentation approaches either aim to improve the object's inner consistency by modeling the global context, or refine objects detail along their boundaries by multi-scale feature fusion. In this paper, a new paradigm for semantic segmentation is proposed. Our insight is that appealing performance of semantic segmentation requires textitexplicitly modeling the object textitbody and textitedge, which correspond to the high and low frequency of the image. We show that the proposed framework with various baselines or backbone networks leads to better object inner consistency and object boundaries.
arXiv Detail & Related papers (2020-07-20T12:11:22Z)
Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences. In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference. Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.