Efficient Clustering for Stretched Mixtures: Landscape and Optimality
- URL: http://arxiv.org/abs/2003.09960v3
- Date: Sat, 27 Nov 2021 23:49:35 GMT
- Title: Efficient Clustering for Stretched Mixtures: Landscape and Optimality
- Authors: Kaizheng Wang, Yuling Yan, Mateo D\'iaz
- Abstract summary: This paper considers a canonical clustering problem where one receives unlabeled samples drawn from a balanced mixture of two elliptical distributions.
We show that the non-optimal clustering function exhibits desirable geometric properties when the sample size exceeds some constant statistical objectives.
- Score: 4.2111286819721485
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper considers a canonical clustering problem where one receives
unlabeled samples drawn from a balanced mixture of two elliptical distributions
and aims for a classifier to estimate the labels. Many popular methods
including PCA and k-means require individual components of the mixture to be
somewhat spherical, and perform poorly when they are stretched. To overcome
this issue, we propose a non-convex program seeking for an affine transform to
turn the data into a one-dimensional point cloud concentrating around $-1$ and
$1$, after which clustering becomes easy. Our theoretical contributions are
two-fold: (1) we show that the non-convex loss function exhibits desirable
geometric properties when the sample size exceeds some constant multiple of the
dimension, and (2) we leverage this to prove that an efficient first-order
algorithm achieves near-optimal statistical precision without good
initialization. We also propose a general methodology for clustering with
flexible choices of feature transforms and loss objectives.
Related papers
- Anchor-free Clustering based on Anchor Graph Factorization [18.809954521548303]
We introduce a novel method termed Anchor-free Clustering based on Anchor Graph Factorization (AFCAGF)
AFCAGF innovates in learning the anchor graph, requiring only the computation of pairwise distances between samples.
We evolve the concept of the membership matrix between cluster centers and samples in FKM into an anchor graph encompassing multiple anchor points and samples.
arXiv Detail & Related papers (2024-02-24T02:16:42Z) - Fast Semi-supervised Unmixing using Non-convex Optimization [85.95119207126292]
We introduce a novel convex convex model for semi/library-based unmixing.
We demonstrate the efficacy of Alternating Methods of sparse unsupervised unmixing.
arXiv Detail & Related papers (2024-01-23T10:07:41Z) - SIGMA: Scale-Invariant Global Sparse Shape Matching [50.385414715675076]
We propose a novel mixed-integer programming (MIP) formulation for generating precise sparse correspondences for non-rigid shapes.
We show state-of-the-art results for sparse non-rigid matching on several challenging 3D datasets.
arXiv Detail & Related papers (2023-08-16T14:25:30Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Clustering based on Mixtures of Sparse Gaussian Processes [6.939768185086753]
How to cluster data using their low dimensional embedded space is still a challenging problem in machine learning.
In this article, we focus on proposing a joint formulation for both clustering and dimensionality reduction.
Our algorithm is based on a mixture of sparse Gaussian processes, which is called Sparse Gaussian Process Mixture Clustering (SGP-MIC)
arXiv Detail & Related papers (2023-03-23T20:44:36Z) - Clustering a Mixture of Gaussians with Unknown Covariance [4.821312633849745]
We derive a Max-Cut integer program based on maximum likelihood estimation.
We develop an efficient spectral algorithm that attains the optimal rate but requires a quadratic sample size.
We generalize the Max-Cut program to a $k$-means program that handles multi-component mixtures with possibly unequal weights.
arXiv Detail & Related papers (2021-10-04T17:59:20Z) - Sparse Quadratic Optimisation over the Stiefel Manifold with Application
to Permutation Synchronisation [71.27989298860481]
We address the non- optimisation problem of finding a matrix on the Stiefel manifold that maximises a quadratic objective function.
We propose a simple yet effective sparsity-promoting algorithm for finding the dominant eigenspace matrix.
arXiv Detail & Related papers (2021-09-30T19:17:35Z) - Deep Magnification-Flexible Upsampling over 3D Point Clouds [103.09504572409449]
We propose a novel end-to-end learning-based framework to generate dense point clouds.
We first formulate the problem explicitly, which boils down to determining the weights and high-order approximation errors.
Then, we design a lightweight neural network to adaptively learn unified and sorted weights as well as the high-order refinements.
arXiv Detail & Related papers (2020-11-25T14:00:18Z) - A Method for Handling Multi-class Imbalanced Data by Geometry based
Information Sampling and Class Prioritized Synthetic Data Generation (GICaPS) [15.433936272310952]
This paper looks into the problem of handling imbalanced data in a multi-label classification problem.
Two novel methods are proposed that exploit the geometric relationship between the feature vectors.
The efficacy of the proposed methods is analyzed by solving a generic multi-class recognition problem.
arXiv Detail & Related papers (2020-10-11T04:04:26Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z) - Outlier-Robust Clustering of Non-Spherical Mixtures [5.863264019032882]
We give the first outlier-robust efficient algorithm for clustering a mixture of $k$ statistically separated d-dimensional Gaussians (k-GMMs)
Our results extend to clustering mixtures of arbitrary affine transforms of the uniform distribution on the $d$-dimensional unit sphere.
arXiv Detail & Related papers (2020-05-06T17:24:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.