Related papers: Discriminative Similarity for Data Clustering

Discriminative Similarity for Data Clustering

URL: http://arxiv.org/abs/2109.08675v1
Date: Fri, 17 Sep 2021 17:56:55 GMT
Title: Discriminative Similarity for Data Clustering
Authors: Yingzhen Yang, Ping Li
Abstract summary: Similarity-based clustering methods separate data into clusters according to the pairwise similarity between the data. We propose Clustering by Discriminative Similarity (CDS), a novel method which learns discriminative similarity for data clustering.
Score: 22.067254105193136
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Similarity-based clustering methods separate data into clusters according to the pairwise similarity between the data, and the pairwise similarity is crucial for their performance. In this paper, we propose Clustering by Discriminative Similarity (CDS), a novel method which learns discriminative similarity for data clustering. CDS learns an unsupervised similarity-based classifier from each data partition, and searches for the optimal partition of the data by minimizing the generalization error of the learnt classifiers associated with the data partitions. By generalization analysis via Rademacher complexity, the generalization error bound for the unsupervised similarity-based classifier is expressed as the sum of discriminative similarity between the data from different classes. It is proved that the derived discriminative similarity can also be induced by the integrated squared error bound for kernel density classification. In order to evaluate the performance of the proposed discriminative similarity, we propose a new clustering method using a kernel as the similarity function, CDS via unsupervised kernel classification (CDSK), with its effectiveness demonstrated by experimental results.

Related papers

Estimating the Optimal Number of Clusters in Categorical Data Clustering by Silhouette Coefficient [0.5939858158928473]
This paper proposes an algorithm named k- SCC to estimate the optimal k in categorical data clustering. Comparative experiments were conducted on both synthetic and real datasets to compare the performance of k- SCC.
arXiv Detail & Related papers (2025-01-26T14:29:11Z)
Cluster-Aware Similarity Diffusion for Instance Retrieval [64.40171728912702]
Diffusion-based re-ranking is a common method used for retrieving instances by performing similarity propagation in a nearest neighbor graph. We propose a novel Cluster-Aware Similarity (CAS) diffusion for instance retrieval.
arXiv Detail & Related papers (2024-06-04T14:19:50Z)
Interpretable Clustering with the Distinguishability Criterion [0.4419843514606336]
We present a global criterion called the Distinguishability criterion to quantify the separability of identified clusters and validate inferred cluster configurations. We propose a combined loss function-based computational framework that integrates the Distinguishability criterion with many commonly used clustering procedures. We present these new algorithms as well as the results from comprehensive data analysis based on simulation studies and real data applications.
arXiv Detail & Related papers (2024-04-24T16:38:15Z)
Spectral Clustering of Categorical and Mixed-type Data via Extra Graph Nodes [0.0]
This paper explores a more natural way to incorporate both numerical and categorical information into the spectral clustering algorithm. We propose adding extra nodes corresponding to the different categories the data may belong to and show that it leads to an interpretable clustering objective function. We demonstrate that this simple framework leads to a linear-time spectral clustering algorithm for categorical-only data.
arXiv Detail & Related papers (2024-03-08T20:49:49Z)
DCSI -- An improved measure of cluster separability based on separation and connectedness [0.0]
Whether class labels in a given data set correspond to meaningful clusters is crucial for the evaluation of clustering algorithms using real-world data sets. The central aspects of separability for density-based clustering are between-class separation and within-class connectedness. A newly developed measure (density cluster separability index, DCSI) aims to quantify these two characteristics and can also be used as a CVI.
arXiv Detail & Related papers (2023-10-19T15:01:57Z)
A Computational Theory and Semi-Supervised Algorithm for Clustering [0.0]
Clustering is the obtainment of groupings of data with no anomalies.<n>The kernel of the clustering method is the perception anomaly detection algorithm.<n>A semi-supervised clustering algorithm is presented.
arXiv Detail & Related papers (2023-06-12T09:15:58Z)
A One-shot Framework for Distributed Clustered Learning in Heterogeneous Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments. One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees. For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z)
Contrastive Fine-grained Class Clustering via Generative Adversarial Networks [9.667133604169829]
We introduce C3-GAN, a method that leverages the categorical inference power of InfoGAN by applying contrastive learning. C3-GAN achieved state-of-the-art clustering performance on four fine-grained benchmark datasets.
arXiv Detail & Related papers (2021-12-30T08:57:11Z)
Shift of Pairwise Similarities for Data Clustering [7.462336024223667]
We consider the case where the regularization term is the sum of the squared size of the clusters, and then generalize it to adaptive regularization of the pairwise similarities. This leads to shifting (adaptively) the pairwise similarities which might make some of them negative. We then propose an efficient local search optimization algorithm with fast theoretical convergence rate to solve the new clustering problem.
arXiv Detail & Related papers (2021-10-25T16:55:07Z)
You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation. We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one. By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z)
Graph Contrastive Clustering [131.67881457114316]
We propose a novel graph contrastive learning framework, which is then applied to the clustering task and we come up with the Graph Constrastive Clustering(GCC) method. Specifically, on the one hand, the graph Laplacian based contrastive loss is proposed to learn more discriminative and clustering-friendly features. On the other hand, a novel graph-based contrastive learning strategy is proposed to learn more compact clustering assignments.
arXiv Detail & Related papers (2021-04-03T15:32:49Z)
Contrastive Clustering [57.71729650297379]
We propose Contrastive Clustering (CC) which explicitly performs the instance- and cluster-level contrastive learning. In particular, CC achieves an NMI of 0.705 (0.431) on the CIFAR-10 (CIFAR-100) dataset, which is an up to 19% (39%) performance improvement compared with the best baseline.
arXiv Detail & Related papers (2020-09-21T08:54:40Z)
LSD-C: Linearly Separable Deep Clusters [145.89790963544314]
We present LSD-C, a novel method to identify clusters in an unlabeled dataset. Our method draws inspiration from recent semi-supervised learning practice and proposes to combine our clustering algorithm with self-supervised pretraining and strong data augmentation. We show that our approach significantly outperforms competitors on popular public image benchmarks including CIFAR 10/100, STL 10 and MNIST, as well as the document classification dataset Reuters 10K.
arXiv Detail & Related papers (2020-06-17T17:58:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.