Randomly Projected Convex Clustering Model: Motivation, Realization, and
Cluster Recovery Guarantees
- URL: http://arxiv.org/abs/2303.16841v1
- Date: Wed, 29 Mar 2023 16:47:25 GMT
- Title: Randomly Projected Convex Clustering Model: Motivation, Realization, and
Cluster Recovery Guarantees
- Authors: Ziwen Wang, Yancheng Yuan, Jiaming Ma, Tieyong Zeng, Defeng Sun
- Abstract summary: We propose a randomly projected convex clustering model for clustering a collection of $n$ high dimensional data points in $mathbbRd$ with $K$ hidden clusters.
We prove that, under some mild conditions, the perfect recovery of the cluster membership assignments of the convex clustering model can be preserved.
The numerical results presented in this paper also demonstrate that the randomly projected convex clustering model can outperform the randomly projected K-means model in practice.
- Score: 18.521314122101774
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a randomly projected convex clustering model for
clustering a collection of $n$ high dimensional data points in $\mathbb{R}^d$
with $K$ hidden clusters. Compared to the convex clustering model for
clustering original data with dimension $d$, we prove that, under some mild
conditions, the perfect recovery of the cluster membership assignments of the
convex clustering model, if exists, can be preserved by the randomly projected
convex clustering model with embedding dimension $m = O(\epsilon^{-2}\log(n))$,
where $0 < \epsilon < 1$ is some given parameter. We further prove that the
embedding dimension can be improved to be $O(\epsilon^{-2}\log(K))$, which is
independent of the number of data points. Extensive numerical experiment
results will be presented in this paper to demonstrate the robustness and
superior performance of the randomly projected convex clustering model. The
numerical results presented in this paper also demonstrate that the randomly
projected convex clustering model can outperform the randomly projected K-means
model in practice.
Related papers
- A Unified Framework for Gradient-based Clustering of Distributed Data [51.904327888475606]
We develop a family of distributed clustering algorithms that work over networks of users.
DGC-$mathcalF_rho$ is specialized to popular clustering losses like $K$-means and Huber loss.
We show that consensus fixed points of DGC-$mathcalF_rho$ are equivalent to fixed points of gradient clustering over the full data.
arXiv Detail & Related papers (2024-02-02T10:44:42Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Personalized Federated Learning via Convex Clustering [72.15857783681658]
We propose a family of algorithms for personalized federated learning with locally convex user costs.
The proposed framework is based on a generalization of convex clustering in which the differences between different users' models are penalized.
arXiv Detail & Related papers (2022-02-01T19:25:31Z) - Local versions of sum-of-norms clustering [77.34726150561087]
We show that our method can separate arbitrarily close balls in the ball model.
We prove a quantitative bound on the error incurred in the clustering of disjoint connected sets.
arXiv Detail & Related papers (2021-09-20T14:45:29Z) - Distribution free optimality intervals for clustering [1.7513645771137178]
Given data $mathcalD$ and a partition $mathcalC$ of these data into $K$ clusters, when can we say that the clusters obtained are correct or meaningful for the data?
This paper introduces a paradigm in which a clustering $mathcalC$ is considered meaningful if it is good with respect to a loss function such as the K-means distortion, and stable, i.e. the only good clustering up to small perturbations.
arXiv Detail & Related papers (2021-07-30T06:13:56Z) - Sum-of-norms clustering does not separate nearby balls [49.1574468325115]
We show a continuous version of sum-of-norms clustering, where the dataset is replaced by a general measure.
We state and prove a local-global characterization of the clustering that seems to be new even in the case of discrete datapoints.
arXiv Detail & Related papers (2021-04-28T13:35:17Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z) - Robust M-Estimation Based Bayesian Cluster Enumeration for Real
Elliptically Symmetric Distributions [5.137336092866906]
Robustly determining optimal number of clusters in a data set is an essential factor in a wide range of applications.
This article generalizes so that it can be used with any arbitrary Really Symmetric (RES) distributed mixture model.
We derive a robust criterion for data sets with finite sample size, and also provide an approximation to reduce the computational cost at large sample sizes.
arXiv Detail & Related papers (2020-05-04T11:44:49Z) - Blocked Clusterwise Regression [0.0]
We generalize previous approaches to discrete unobserved heterogeneity by allowing each unit to have multiple latent variables.
We contribute to the theory of clustering with an over-specified number of clusters and derive new convergence rates for this setting.
arXiv Detail & Related papers (2020-01-29T23:29:31Z) - A new model for natural groupings in high-dimensional data [0.4604003661048266]
Clustering aims to divide a set of points into groups.
Recent experiments have uncovered several high-dimensional datasets that form different binary groupings.
This paper describes a probability model for the data that could explain this phenomenon.
arXiv Detail & Related papers (2019-09-14T02:38:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.