Related papers: Efficient Clustering for Stretched Mixtures: Landscape and Optimality

Efficient Clustering for Stretched Mixtures: Landscape and Optimality

URL: http://arxiv.org/abs/2003.09960v3
Date: Sat, 27 Nov 2021 23:49:35 GMT
Title: Efficient Clustering for Stretched Mixtures: Landscape and Optimality
Authors: Kaizheng Wang, Yuling Yan, Mateo D\'iaz
Abstract summary: This paper considers a canonical clustering problem where one receives unlabeled samples drawn from a balanced mixture of two elliptical distributions. We show that the non-optimal clustering function exhibits desirable geometric properties when the sample size exceeds some constant statistical objectives.
Score: 4.2111286819721485
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper considers a canonical clustering problem where one receives unlabeled samples drawn from a balanced mixture of two elliptical distributions and aims for a classifier to estimate the labels. Many popular methods including PCA and k-means require individual components of the mixture to be somewhat spherical, and perform poorly when they are stretched. To overcome this issue, we propose a non-convex program seeking for an affine transform to turn the data into a one-dimensional point cloud concentrating around $-1$ and $1$, after which clustering becomes easy. Our theoretical contributions are two-fold: (1) we show that the non-convex loss function exhibits desirable geometric properties when the sample size exceeds some constant multiple of the dimension, and (2) we leverage this to prove that an efficient first-order algorithm achieves near-optimal statistical precision without good initialization. We also propose a general methodology for clustering with flexible choices of feature transforms and loss objectives.

Related papers

Exact and Heuristic Algorithms for Constrained Biclustering [0.0]
Biclustering, also known as co-clustering or two-way clustering, simultaneously partitions the rows and columns of a data matrix to reveal submatrices with coherent patterns.<n>We study constrained biclustering with pairwise constraints, namely must-link and cannot-link constraints, which specify whether objects should belong to the same or different biclusters.
arXiv Detail & Related papers (2025-08-07T15:29:22Z)
Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks. We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z)
SP$^2$OT: Semantic-Regularized Progressive Partial Optimal Transport for Imbalanced Clustering [14.880015659013681]
We introduce a novel optimal transport-based pseudo-label learning framework.<n>Our framework generates high-quality and imbalance-aware pseudo-labels.<n>Experiments on various datasets, including a human-curated long-tailed CIFAR100, demonstrate the superiority of our method.
arXiv Detail & Related papers (2024-04-04T13:46:52Z)
Anchor-free Clustering based on Anchor Graph Factorization [17.218481911995365]
We introduce a novel method termed Anchor-free Clustering based on Anchor Graph Factorization (AFCAGF) AFCAGF innovates in learning the anchor graph, requiring only the computation of pairwise distances between samples. We evolve the concept of the membership matrix between cluster centers and samples in FKM into an anchor graph encompassing multiple anchor points and samples.
arXiv Detail & Related papers (2024-02-24T02:16:42Z)
Fast Semisupervised Unmixing Using Nonconvex Optimization [80.11512905623417]
We introduce a novel convex convex model for semi/library-based unmixing. We demonstrate the efficacy of Alternating Methods of sparse unsupervised unmixing.
arXiv Detail & Related papers (2024-01-23T10:07:41Z)
A provable initialization and robust clustering method for general mixture models [6.806940901668607]
Clustering is a fundamental tool in statistical machine learning in the presence of heterogeneous data. Most recent results focus on optimal mislabeling guarantees when data are distributed around centroids with sub-Gaussian errors.
arXiv Detail & Related papers (2024-01-10T22:56:44Z)
Revisiting Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [69.15976031704687]
We propose IAC (Instance-Adaptive Clustering), the first algorithm whose performance matches the instance-specific lower bounds both in expectation and with high probability. IAC maintains an overall computational complexity of $ mathcalO(n, textpolylog(n) $, making it scalable and practical for large-scale problems.
arXiv Detail & Related papers (2023-06-18T08:46:06Z)
Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation. Specifically, we construct distance matrix between data points by Butterworth filter. To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z)
Clustering based on Mixtures of Sparse Gaussian Processes [6.939768185086753]
How to cluster data using their low dimensional embedded space is still a challenging problem in machine learning. In this article, we focus on proposing a joint formulation for both clustering and dimensionality reduction. Our algorithm is based on a mixture of sparse Gaussian processes, which is called Sparse Gaussian Process Mixture Clustering (SGP-MIC)
arXiv Detail & Related papers (2023-03-23T20:44:36Z)
Clustering a Mixture of Gaussians with Unknown Covariance [4.821312633849745]
We derive a Max-Cut integer program based on maximum likelihood estimation. We develop an efficient spectral algorithm that attains the optimal rate but requires a quadratic sample size. We generalize the Max-Cut program to a $k$-means program that handles multi-component mixtures with possibly unequal weights.
arXiv Detail & Related papers (2021-10-04T17:59:20Z)
Sparse Quadratic Optimisation over the Stiefel Manifold with Application to Permutation Synchronisation [71.27989298860481]
We address the non- optimisation problem of finding a matrix on the Stiefel manifold that maximises a quadratic objective function. We propose a simple yet effective sparsity-promoting algorithm for finding the dominant eigenspace matrix.
arXiv Detail & Related papers (2021-09-30T19:17:35Z)
Deep Magnification-Flexible Upsampling over 3D Point Clouds [103.09504572409449]
We propose a novel end-to-end learning-based framework to generate dense point clouds. We first formulate the problem explicitly, which boils down to determining the weights and high-order approximation errors. Then, we design a lightweight neural network to adaptively learn unified and sorted weights as well as the high-order refinements.
arXiv Detail & Related papers (2020-11-25T14:00:18Z)
Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA. We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z)
Outlier-Robust Clustering of Non-Spherical Mixtures [5.863264019032882]
We give the first outlier-robust efficient algorithm for clustering a mixture of $k$ statistically separated d-dimensional Gaussians (k-GMMs) Our results extend to clustering mixtures of arbitrary affine transforms of the uniform distribution on the $d$-dimensional unit sphere.
arXiv Detail & Related papers (2020-05-06T17:24:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.