Recovery Guarantees for Kernel-based Clustering under Non-parametric
Mixture Models
- URL: http://arxiv.org/abs/2110.09476v1
- Date: Mon, 18 Oct 2021 17:23:54 GMT
- Title: Recovery Guarantees for Kernel-based Clustering under Non-parametric
Mixture Models
- Authors: Leena Chennuru Vankadara, Sebastian Bordt, Ulrike von Luxburg,
Debarghya Ghoshdastidar
- Abstract summary: We study the statistical performance of kernel-based clustering algorithms under non-parametric mixture models.
We establish a key equivalence between kernel-based data-clustering and kernel density-based clustering.
- Score: 26.847612684502998
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the ubiquity of kernel-based clustering, surprisingly few statistical
guarantees exist beyond settings that consider strong structural assumptions on
the data generation process. In this work, we take a step towards bridging this
gap by studying the statistical performance of kernel-based clustering
algorithms under non-parametric mixture models. We provide necessary and
sufficient separability conditions under which these algorithms can
consistently recover the underlying true clustering. Our analysis provides
guarantees for kernel clustering approaches without structural assumptions on
the form of the component distributions. Additionally, we establish a key
equivalence between kernel-based data-clustering and kernel density-based
clustering. This enables us to provide consistency guarantees for kernel-based
estimators of non-parametric mixture models. Along with theoretical
implications, this connection could have practical implications, including in
the systematic choice of the bandwidth of the Gaussian kernel in the context of
clustering.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - GCC: Generative Calibration Clustering [55.44944397168619]
We propose a novel Generative Clustering (GCC) method to incorporate feature learning and augmentation into clustering procedure.
First, we develop a discrimirative feature alignment mechanism to discover intrinsic relationship across real and generated samples.
Second, we design a self-supervised metric learning to generate more reliable cluster assignment.
arXiv Detail & Related papers (2024-04-14T01:51:11Z) - Kernel Correlation-Dissimilarity for Multiple Kernel k-Means Clustering [21.685153346752124]
Current methods enhance information diversity and reduce redundancy by exploiting interdependencies among multiple kernels based on correlations or dissimilarities.
We introduce a novel method that systematically integrates both kernel correlation and dissimilarity.
By emphasizing the coherence between kernel correlation and dissimilarity, our method offers a more objective and transparent strategy for extracting non-linear information.
arXiv Detail & Related papers (2024-03-06T04:24:43Z) - Robust and Automatic Data Clustering: Dirichlet Process meets
Median-of-Means [18.3248037914529]
We present an efficient and automatic clustering technique by integrating the principles of model-based and centroid-based methodologies.
Statistical guarantees on the upper bound of clustering error suggest the advantages of our proposed method over existing state-of-the-art clustering algorithms.
arXiv Detail & Related papers (2023-11-26T19:01:15Z) - Local Sample-weighted Multiple Kernel Clustering with Consensus
Discriminative Graph [73.68184322526338]
Multiple kernel clustering (MKC) is committed to achieving optimal information fusion from a set of base kernels.
This paper proposes a novel local sample-weighted multiple kernel clustering model.
Experimental results demonstrate that our LSWMKC possesses better local manifold representation and outperforms existing kernel or graph-based clustering algo-rithms.
arXiv Detail & Related papers (2022-07-05T05:00:38Z) - Perfect Spectral Clustering with Discrete Covariates [68.8204255655161]
We propose a spectral algorithm that achieves perfect clustering with high probability on a class of large, sparse networks.
Our method is the first to offer a guarantee of consistent latent structure recovery using spectral clustering.
arXiv Detail & Related papers (2022-05-17T01:41:06Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - The Impact of Isolation Kernel on Agglomerative Hierarchical Clustering
Algorithms [12.363083467305787]
Agglomerative hierarchical clustering (AHC) is one of the popular clustering approaches.
Existing AHC methods, which are based on a distance measure, have difficulty in identifying adjacent clusters with varied densities.
We show that the use of a data-dependent kernel (instead of distance or existing kernel) provides an effective means to address it.
arXiv Detail & Related papers (2020-10-12T06:18:38Z) - Kernel learning approaches for summarising and combining posterior
similarity matrices [68.8204255655161]
We build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models.
A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices.
arXiv Detail & Related papers (2020-09-27T14:16:14Z) - Real Elliptically Skewed Distributions and Their Application to Robust
Cluster Analysis [5.137336092866906]
This article proposes a new class of Really Skewed (RESK) distributions and associated clustering algorithms.
Non-symmetrically distributed and heavy-tailed data clusters have been reported in a variety of real-world applications.
arXiv Detail & Related papers (2020-06-30T10:44:39Z) - Simple and Scalable Sparse k-means Clustering via Feature Ranking [14.839931533868176]
We propose a novel framework for sparse k-means clustering that is intuitive, simple to implement, and competitive with state-of-the-art algorithms.
Our core method readily generalizes to several task-specific algorithms such as clustering on subsets of attributes and in partially observed data settings.
arXiv Detail & Related papers (2020-02-20T02:41:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.