Efficient Clustering for Stretched Mixtures: Landscape and Optimality
- URL: http://arxiv.org/abs/2003.09960v3
- Date: Sat, 27 Nov 2021 23:49:35 GMT
- Title: Efficient Clustering for Stretched Mixtures: Landscape and Optimality
- Authors: Kaizheng Wang, Yuling Yan, Mateo D\'iaz
- Abstract summary: This paper considers a canonical clustering problem where one receives unlabeled samples drawn from a balanced mixture of two elliptical distributions.
We show that the non-optimal clustering function exhibits desirable geometric properties when the sample size exceeds some constant statistical objectives.
- Score: 4.2111286819721485
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper considers a canonical clustering problem where one receives
unlabeled samples drawn from a balanced mixture of two elliptical distributions
and aims for a classifier to estimate the labels. Many popular methods
including PCA and k-means require individual components of the mixture to be
somewhat spherical, and perform poorly when they are stretched. To overcome
this issue, we propose a non-convex program seeking for an affine transform to
turn the data into a one-dimensional point cloud concentrating around $-1$ and
$1$, after which clustering becomes easy. Our theoretical contributions are
two-fold: (1) we show that the non-convex loss function exhibits desirable
geometric properties when the sample size exceeds some constant multiple of the
dimension, and (2) we leverage this to prove that an efficient first-order
algorithm achieves near-optimal statistical precision without good
initialization. We also propose a general methodology for clustering with
flexible choices of feature transforms and loss objectives.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Anchor-free Clustering based on Anchor Graph Factorization [17.218481911995365]
We introduce a novel method termed Anchor-free Clustering based on Anchor Graph Factorization (AFCAGF)
AFCAGF innovates in learning the anchor graph, requiring only the computation of pairwise distances between samples.
We evolve the concept of the membership matrix between cluster centers and samples in FKM into an anchor graph encompassing multiple anchor points and samples.
arXiv Detail & Related papers (2024-02-24T02:16:42Z) - Fast Semisupervised Unmixing Using Nonconvex Optimization [80.11512905623417]
We introduce a novel convex convex model for semi/library-based unmixing.
We demonstrate the efficacy of Alternating Methods of sparse unsupervised unmixing.
arXiv Detail & Related papers (2024-01-23T10:07:41Z) - A provable initialization and robust clustering method for general mixture models [6.806940901668607]
Clustering is a fundamental tool in statistical machine learning in the presence of heterogeneous data.
Most recent results focus on optimal mislabeling guarantees when data are distributed around centroids with sub-Gaussian errors.
arXiv Detail & Related papers (2024-01-10T22:56:44Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Clustering based on Mixtures of Sparse Gaussian Processes [6.939768185086753]
How to cluster data using their low dimensional embedded space is still a challenging problem in machine learning.
In this article, we focus on proposing a joint formulation for both clustering and dimensionality reduction.
Our algorithm is based on a mixture of sparse Gaussian processes, which is called Sparse Gaussian Process Mixture Clustering (SGP-MIC)
arXiv Detail & Related papers (2023-03-23T20:44:36Z) - Clustering a Mixture of Gaussians with Unknown Covariance [4.821312633849745]
We derive a Max-Cut integer program based on maximum likelihood estimation.
We develop an efficient spectral algorithm that attains the optimal rate but requires a quadratic sample size.
We generalize the Max-Cut program to a $k$-means program that handles multi-component mixtures with possibly unequal weights.
arXiv Detail & Related papers (2021-10-04T17:59:20Z) - Sparse Quadratic Optimisation over the Stiefel Manifold with Application
to Permutation Synchronisation [71.27989298860481]
We address the non- optimisation problem of finding a matrix on the Stiefel manifold that maximises a quadratic objective function.
We propose a simple yet effective sparsity-promoting algorithm for finding the dominant eigenspace matrix.
arXiv Detail & Related papers (2021-09-30T19:17:35Z) - Deep Magnification-Flexible Upsampling over 3D Point Clouds [103.09504572409449]
We propose a novel end-to-end learning-based framework to generate dense point clouds.
We first formulate the problem explicitly, which boils down to determining the weights and high-order approximation errors.
Then, we design a lightweight neural network to adaptively learn unified and sorted weights as well as the high-order refinements.
arXiv Detail & Related papers (2020-11-25T14:00:18Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z) - Outlier-Robust Clustering of Non-Spherical Mixtures [5.863264019032882]
We give the first outlier-robust efficient algorithm for clustering a mixture of $k$ statistically separated d-dimensional Gaussians (k-GMMs)
Our results extend to clustering mixtures of arbitrary affine transforms of the uniform distribution on the $d$-dimensional unit sphere.
arXiv Detail & Related papers (2020-05-06T17:24:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.