Improving k-Means Clustering Performance with Disentangled Internal
Representations
- URL: http://arxiv.org/abs/2006.04535v1
- Date: Fri, 5 Jun 2020 11:32:34 GMT
- Title: Improving k-Means Clustering Performance with Disentangled Internal
Representations
- Authors: Abien Fred Agarap, Arnulfo P. Azcarraga
- Abstract summary: We propose a simpler approach of optimizing the entanglement of the learned latent code representation of an autoencoder.
Using our proposed approach, the test clustering accuracy was 96.2% on the MNIST dataset, 85.6% on the Fashion-MNIST dataset, and 79.2% on the EMNIST Balanced dataset, outperforming our baseline models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deep clustering algorithms combine representation learning and clustering by
jointly optimizing a clustering loss and a non-clustering loss. In such
methods, a deep neural network is used for representation learning together
with a clustering network. Instead of following this framework to improve
clustering performance, we propose a simpler approach of optimizing the
entanglement of the learned latent code representation of an autoencoder. We
define entanglement as how close pairs of points from the same class or
structure are, relative to pairs of points from different classes or
structures. To measure the entanglement of data points, we use the soft nearest
neighbor loss, and expand it by introducing an annealing temperature factor.
Using our proposed approach, the test clustering accuracy was 96.2% on the
MNIST dataset, 85.6% on the Fashion-MNIST dataset, and 79.2% on the EMNIST
Balanced dataset, outperforming our baseline models.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Fuzzy K-Means Clustering without Cluster Centroids [21.256564324236333]
Fuzzy K-Means clustering is a critical technique in unsupervised data analysis.
This paper proposes a novel Fuzzy textitK-Means clustering algorithm that entirely eliminates the reliance on cluster centroids.
arXiv Detail & Related papers (2024-04-07T12:25:03Z) - Dink-Net: Neural Clustering on Large Graphs [59.10189693120368]
A deep graph clustering method (Dink-Net) is proposed with the idea of dilation and shrink.
By discriminating nodes, whether being corrupted by augmentations, representations are learned in a self-supervised manner.
The clustering distribution is optimized by minimizing the proposed cluster dilation loss and cluster shrink loss.
Compared to the runner-up, Dink-Net 9.62% achieves NMI improvement on the ogbn-papers100M dataset with 111 million nodes and 1.6 billion edges.
arXiv Detail & Related papers (2023-05-28T15:33:24Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Cluster Analysis with Deep Embeddings and Contrastive Learning [0.0]
This work proposes a novel framework for performing image clustering from deep embeddings.
Our approach jointly learns representations and predicts cluster centers in an end-to-end manner.
Our framework performs on par with widely accepted clustering methods and outperforms the state-of-the-art contrastive learning method on the CIFAR-10 dataset.
arXiv Detail & Related papers (2021-09-26T22:18:15Z) - Learning Statistical Representation with Joint Deep Embedded Clustering [2.1267423178232407]
StatDEC is an unsupervised framework for joint statistical representation learning and clustering.
Our experiments show that using these representations, one can considerably improve results on imbalanced image clustering across a variety of image datasets.
arXiv Detail & Related papers (2021-09-11T09:26:52Z) - Neural Mixture Models with Expectation-Maximization for End-to-end Deep
Clustering [0.8543753708890495]
In this paper, we realize mixture model-based clustering with a neural network.
We train the network end-to-end via batch-wise EM iterations where the forward pass acts as the E-step and the backward pass acts as the M-step.
Our trained networks outperform single-stage deep clustering methods that still depend on k-means.
arXiv Detail & Related papers (2021-07-06T08:00:58Z) - Meta-learning representations for clustering with infinite Gaussian
mixture models [39.56814839510978]
We propose a meta-learning method that train neural networks for obtaining representations such that clustering performance improves.
The proposed method can cluster unseen unlabeled data using knowledge meta-learned with labeled data that are different from the unlabeled data.
arXiv Detail & Related papers (2021-03-01T02:05:31Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Contrastive Clustering [57.71729650297379]
We propose Contrastive Clustering (CC) which explicitly performs the instance- and cluster-level contrastive learning.
In particular, CC achieves an NMI of 0.705 (0.431) on the CIFAR-10 (CIFAR-100) dataset, which is an up to 19% (39%) performance improvement compared with the best baseline.
arXiv Detail & Related papers (2020-09-21T08:54:40Z) - LSD-C: Linearly Separable Deep Clusters [145.89790963544314]
We present LSD-C, a novel method to identify clusters in an unlabeled dataset.
Our method draws inspiration from recent semi-supervised learning practice and proposes to combine our clustering algorithm with self-supervised pretraining and strong data augmentation.
We show that our approach significantly outperforms competitors on popular public image benchmarks including CIFAR 10/100, STL 10 and MNIST, as well as the document classification dataset Reuters 10K.
arXiv Detail & Related papers (2020-06-17T17:58:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.