ACSeg: Adaptive Conceptualization for Unsupervised Semantic Segmentation
- URL: http://arxiv.org/abs/2210.05944v3
- Date: Thu, 30 Mar 2023 03:12:05 GMT
- Title: ACSeg: Adaptive Conceptualization for Unsupervised Semantic Segmentation
- Authors: Kehan Li, Zhennan Wang, Zesen Cheng, Runyi Yu, Yian Zhao, Guoli Song,
Chang Liu, Li Yuan, Jie Chen
- Abstract summary: Self-supervised visual pre-training models have shown great promise in representing pixel-level semantic relationships.
In this work, we investigate the pixel-level semantic aggregation in self-trained models as image encodes and design concepts.
We propose the Adaptive Concept Generator (ACG) which adaptively maps these prototypes to informative concepts for each image.
- Score: 17.019848796027485
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, self-supervised large-scale visual pre-training models have shown
great promise in representing pixel-level semantic relationships, significantly
promoting the development of unsupervised dense prediction tasks, e.g.,
unsupervised semantic segmentation (USS). The extracted relationship among
pixel-level representations typically contains rich class-aware information
that semantically identical pixel embeddings in the representation space gather
together to form sophisticated concepts. However, leveraging the learned models
to ascertain semantically consistent pixel groups or regions in the image is
non-trivial since over/ under-clustering overwhelms the conceptualization
procedure under various semantic distributions of different images. In this
work, we investigate the pixel-level semantic aggregation in self-supervised
ViT pre-trained models as image Segmentation and propose the Adaptive
Conceptualization approach for USS, termed ACSeg. Concretely, we explicitly
encode concepts into learnable prototypes and design the Adaptive Concept
Generator (ACG), which adaptively maps these prototypes to informative concepts
for each image. Meanwhile, considering the scene complexity of different
images, we propose the modularity loss to optimize ACG independent of the
concept number based on estimating the intensity of pixel pairs belonging to
the same concept. Finally, we turn the USS task into classifying the discovered
concepts in an unsupervised manner. Extensive experiments with state-of-the-art
results demonstrate the effectiveness of the proposed ACSeg.
Related papers
- A Spitting Image: Modular Superpixel Tokenization in Vision Transformers [0.0]
Vision Transformer (ViT) architectures traditionally employ a grid-based approach to tokenization independent of the semantic content of an image.
We propose a modular superpixel tokenization strategy which decouples tokenization and feature extraction.
arXiv Detail & Related papers (2024-08-14T17:28:58Z) - Visual Concept-driven Image Generation with Text-to-Image Diffusion Model [65.96212844602866]
Text-to-image (TTI) models have demonstrated impressive results in generating high-resolution images of complex scenes.
Recent approaches have extended these methods with personalization techniques that allow them to integrate user-illustrated concepts.
However, the ability to generate images with multiple interacting concepts, such as human subjects, as well as concepts that may be entangled in one, or across multiple, image illustrations remains illusive.
We propose a concept-driven TTI personalization framework that addresses these core challenges.
arXiv Detail & Related papers (2024-02-18T07:28:37Z) - Advancing Ante-Hoc Explainable Models through Generative Adversarial Networks [24.45212348373868]
This paper presents a novel concept learning framework for enhancing model interpretability and performance in visual classification tasks.
Our approach appends an unsupervised explanation generator to the primary classifier network and makes use of adversarial training.
This work presents a significant step towards building inherently interpretable deep vision models with task-aligned concept representations.
arXiv Detail & Related papers (2024-01-09T16:16:16Z) - CEIR: Concept-based Explainable Image Representation Learning [0.4198865250277024]
We introduce Concept-based Explainable Image Representation (CEIR) to derive high-quality representations without label dependency.
Our method exhibits state-of-the-art unsupervised clustering performance on benchmarks such as CIFAR10, CIFAR100, and STL10.
CEIR can seamlessly extract the related concept from open-world images without fine-tuning.
arXiv Detail & Related papers (2023-12-17T15:37:41Z) - Cross-Modal Concept Learning and Inference for Vision-Language Models [31.463771883036607]
In existing fine-tuning methods, the class-specific text description is matched against the whole image.
We develop a new method called cross-model concept learning and inference (CCLI)
Our method automatically learns a large set of distinctive visual concepts from images using a set of semantic text concepts.
arXiv Detail & Related papers (2023-07-28T10:26:28Z) - Unsupervised Hashing with Semantic Concept Mining [37.215530006668935]
In this work, we propose a novel Un Hashing with Semantic Mining Concept, calledCM, which leverages a.
high-quality similarity matrix.
With the semantic similarity matrix as guiding information, a novel hashing loss with a modified contrastive loss based regularization item is proposed to optimize the hashing network.
arXiv Detail & Related papers (2022-09-23T08:25:24Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z) - Closed-Form Factorization of Latent Semantics in GANs [65.42778970898534]
A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images.
In this work, we examine the internal representation learned by GANs to reveal the underlying variation factors in an unsupervised manner.
We propose a closed-form factorization algorithm for latent semantic discovery by directly decomposing the pre-trained weights.
arXiv Detail & Related papers (2020-07-13T18:05:36Z) - Gradient-Induced Co-Saliency Detection [81.54194063218216]
Co-saliency detection (Co-SOD) aims to segment the common salient foreground in a group of relevant images.
In this paper, inspired by human behavior, we propose a gradient-induced co-saliency detection method.
arXiv Detail & Related papers (2020-04-28T08:40:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.