Discriminative Similarity for Data Clustering
- URL: http://arxiv.org/abs/2109.08675v1
- Date: Fri, 17 Sep 2021 17:56:55 GMT
- Title: Discriminative Similarity for Data Clustering
- Authors: Yingzhen Yang, Ping Li
- Abstract summary: Similarity-based clustering methods separate data into clusters according to the pairwise similarity between the data.
We propose Clustering by Discriminative Similarity (CDS), a novel method which learns discriminative similarity for data clustering.
- Score: 22.067254105193136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Similarity-based clustering methods separate data into clusters according to
the pairwise similarity between the data, and the pairwise similarity is
crucial for their performance. In this paper, we propose Clustering by
Discriminative Similarity (CDS), a novel method which learns discriminative
similarity for data clustering. CDS learns an unsupervised similarity-based
classifier from each data partition, and searches for the optimal partition of
the data by minimizing the generalization error of the learnt classifiers
associated with the data partitions. By generalization analysis via Rademacher
complexity, the generalization error bound for the unsupervised
similarity-based classifier is expressed as the sum of discriminative
similarity between the data from different classes. It is proved that the
derived discriminative similarity can also be induced by the integrated squared
error bound for kernel density classification. In order to evaluate the
performance of the proposed discriminative similarity, we propose a new
clustering method using a kernel as the similarity function, CDS via
unsupervised kernel classification (CDSK), with its effectiveness demonstrated
by experimental results.
Related papers
- Cluster-Aware Similarity Diffusion for Instance Retrieval [64.40171728912702]
Diffusion-based re-ranking is a common method used for retrieving instances by performing similarity propagation in a nearest neighbor graph.
We propose a novel Cluster-Aware Similarity (CAS) diffusion for instance retrieval.
arXiv Detail & Related papers (2024-06-04T14:19:50Z) - Interpretable Clustering with the Distinguishability Criterion [0.4419843514606336]
We present a global criterion called the Distinguishability criterion to quantify the separability of identified clusters and validate inferred cluster configurations.
We propose a combined loss function-based computational framework that integrates the Distinguishability criterion with many commonly used clustering procedures.
We present these new algorithms as well as the results from comprehensive data analysis based on simulation studies and real data applications.
arXiv Detail & Related papers (2024-04-24T16:38:15Z) - Spectral Clustering of Categorical and Mixed-type Data via Extra Graph
Nodes [0.0]
This paper explores a more natural way to incorporate both numerical and categorical information into the spectral clustering algorithm.
We propose adding extra nodes corresponding to the different categories the data may belong to and show that it leads to an interpretable clustering objective function.
We demonstrate that this simple framework leads to a linear-time spectral clustering algorithm for categorical-only data.
arXiv Detail & Related papers (2024-03-08T20:49:49Z) - DCSI -- An improved measure of cluster separability based on separation and connectedness [0.0]
Whether class labels in a given data set correspond to meaningful clusters is crucial for the evaluation of clustering algorithms using real-world data sets.
The central aspects of separability for density-based clustering are between-class separation and within-class connectedness.
A newly developed measure (density cluster separability index, DCSI) aims to quantify these two characteristics and can also be used as a CVI.
arXiv Detail & Related papers (2023-10-19T15:01:57Z) - A One-shot Framework for Distributed Clustered Learning in Heterogeneous
Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments.
One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees.
For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z) - Contrastive Fine-grained Class Clustering via Generative Adversarial
Networks [9.667133604169829]
We introduce C3-GAN, a method that leverages the categorical inference power of InfoGAN by applying contrastive learning.
C3-GAN achieved state-of-the-art clustering performance on four fine-grained benchmark datasets.
arXiv Detail & Related papers (2021-12-30T08:57:11Z) - Shift of Pairwise Similarities for Data Clustering [7.462336024223667]
We consider the case where the regularization term is the sum of the squared size of the clusters, and then generalize it to adaptive regularization of the pairwise similarities.
This leads to shifting (adaptively) the pairwise similarities which might make some of them negative.
We then propose an efficient local search optimization algorithm with fast theoretical convergence rate to solve the new clustering problem.
arXiv Detail & Related papers (2021-10-25T16:55:07Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Graph Contrastive Clustering [131.67881457114316]
We propose a novel graph contrastive learning framework, which is then applied to the clustering task and we come up with the Graph Constrastive Clustering(GCC) method.
Specifically, on the one hand, the graph Laplacian based contrastive loss is proposed to learn more discriminative and clustering-friendly features.
On the other hand, a novel graph-based contrastive learning strategy is proposed to learn more compact clustering assignments.
arXiv Detail & Related papers (2021-04-03T15:32:49Z) - Contrastive Clustering [57.71729650297379]
We propose Contrastive Clustering (CC) which explicitly performs the instance- and cluster-level contrastive learning.
In particular, CC achieves an NMI of 0.705 (0.431) on the CIFAR-10 (CIFAR-100) dataset, which is an up to 19% (39%) performance improvement compared with the best baseline.
arXiv Detail & Related papers (2020-09-21T08:54:40Z) - LSD-C: Linearly Separable Deep Clusters [145.89790963544314]
We present LSD-C, a novel method to identify clusters in an unlabeled dataset.
Our method draws inspiration from recent semi-supervised learning practice and proposes to combine our clustering algorithm with self-supervised pretraining and strong data augmentation.
We show that our approach significantly outperforms competitors on popular public image benchmarks including CIFAR 10/100, STL 10 and MNIST, as well as the document classification dataset Reuters 10K.
arXiv Detail & Related papers (2020-06-17T17:58:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.