Learning Representations for Clustering via Partial Information
Discrimination and Cross-Level Interaction
- URL: http://arxiv.org/abs/2401.13503v1
- Date: Wed, 24 Jan 2024 14:51:33 GMT
- Title: Learning Representations for Clustering via Partial Information
Discrimination and Cross-Level Interaction
- Authors: Hai-Xin Zhang, Dong Huang, Hua-Bao Ling, Guang-Yu Zhang, Wei-jun Sun
and Zi-hao Wen
- Abstract summary: We present a novel deep image clustering approach termed PICI, which enforces the partial information discrimination and the cross-level interaction.
In particular, we leverage a Transformer encoder as the backbone, through which the masked image modeling with two paralleled augmented views is formulated.
- Score: 5.101836008369192
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present a novel deep image clustering approach termed PICI,
which enforces the partial information discrimination and the cross-level
interaction in a joint learning framework. In particular, we leverage a
Transformer encoder as the backbone, through which the masked image modeling
with two paralleled augmented views is formulated. After deriving the class
tokens from the masked images by the Transformer encoder, three partial
information learning modules are further incorporated, including the PISD
module for training the auto-encoder via masked image reconstruction, the PICD
module for employing two levels of contrastive learning, and the CLI module for
mutual interaction between the instance-level and cluster-level subspaces.
Extensive experiments have been conducted on six real-world image datasets,
which demononstrate the superior clustering performance of the proposed PICI
approach over the state-of-the-art deep clustering approaches. The source code
is available at https://github.com/Regan-Zhang/PICI.
Related papers
- Masked Image Modeling Boosting Semi-Supervised Semantic Segmentation [38.55611683982936]
We introduce a novel class-wise masked image modeling that independently reconstructs different image regions according to their respective classes.
We develop a feature aggregation strategy that minimizes the distances between features corresponding to the masked and visible parts within the same class.
In semantic space, we explore the application of masked image modeling to enhance regularization.
arXiv Detail & Related papers (2024-11-13T16:42:07Z) - P-MSDiff: Parallel Multi-Scale Diffusion for Remote Sensing Image Segmentation [8.46409964236009]
Diffusion models and multi-scale features are essential components in semantic segmentation tasks.
We propose a new model for semantic segmentation known as the diffusion model with parallel multi-scale branches.
Our model demonstrates superior performance based on the J1 metric on both the UAVid and Vaihingen Building datasets.
arXiv Detail & Related papers (2024-05-30T19:40:08Z) - Dcl-Net: Dual Contrastive Learning Network for Semi-Supervised
Multi-Organ Segmentation [12.798684146496754]
We propose a two-stage Dual Contrastive Learning Network for semi-supervised MoS.
In Stage 1, we develop a similarity-guided global contrastive learning to explore the implicit continuity and similarity among images.
In Stage 2, we present an organ-aware local contrastive learning to further attract the class representations.
arXiv Detail & Related papers (2024-03-06T07:39:33Z) - De-coupling and De-positioning Dense Self-supervised Learning [65.56679416475943]
Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects.
We show that they suffer from coupling and positional bias, which arise from the receptive field increasing with layer depth and zero-padding.
We demonstrate the benefits of our method on COCO and on a new challenging benchmark, OpenImage-MINI, for object classification, semantic segmentation, and object detection.
arXiv Detail & Related papers (2023-03-29T18:07:25Z) - Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module.
Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z) - Deep Image Clustering with Contrastive Learning and Multi-scale Graph
Convolutional Networks [58.868899595936476]
This paper presents a new deep clustering approach termed image clustering with contrastive learning and multi-scale graph convolutional networks (IcicleGCN)
Experiments on multiple image datasets demonstrate the superior clustering performance of IcicleGCN over the state-of-the-art.
arXiv Detail & Related papers (2022-07-14T19:16:56Z) - Cross-View Panorama Image Synthesis [68.35351563852335]
PanoGAN is a novel adversarial feedback GAN framework named.
PanoGAN enables high-quality panorama image generation with more convincing details than state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-22T15:59:44Z) - CRIS: CLIP-Driven Referring Image Segmentation [71.56466057776086]
We propose an end-to-end CLIP-Driven Referring Image framework (CRIS)
CRIS resorts to vision-language decoding and contrastive learning for achieving the text-to-pixel alignment.
Our proposed framework significantly outperforms the state-of-the-art performance without any post-processing.
arXiv Detail & Related papers (2021-11-30T07:29:08Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.