Convergence and Recovery Guarantees of the K-Subspaces Method for
Subspace Clustering
- URL: http://arxiv.org/abs/2206.05553v1
- Date: Sat, 11 Jun 2022 15:47:21 GMT
- Title: Convergence and Recovery Guarantees of the K-Subspaces Method for
Subspace Clustering
- Authors: Peng Wang, Huikang Liu, Anthony Man-Cho So, Laura Balzano
- Abstract summary: The K-subspaces (KSS) method is a generalization of the K-means method for subspace clustering.
We show that if the initial assignment of the KSS method lies within a neighborhood of a true clustering, it converges at a superlinear rate.
- Score: 33.1464583756714
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The K-subspaces (KSS) method is a generalization of the K-means method for
subspace clustering. In this work, we present local convergence analysis and a
recovery guarantee for KSS, assuming data are generated by the semi-random
union of subspaces model, where $N$ points are randomly sampled from $K \ge 2$
overlapping subspaces. We show that if the initial assignment of the KSS method
lies within a neighborhood of a true clustering, it converges at a superlinear
rate and finds the correct clustering within $\Theta(\log\log N)$ iterations
with high probability. Moreover, we propose a thresholding inner-product based
spectral method for initialization and prove that it produces a point in this
neighborhood. We also present numerical results of the studied method to
support our theoretical developments.
Related papers
- A New Framework for Convex Clustering in Kernel Spaces: Finite Sample Bounds, Consistency and Performance Insights [27.253757893368387]
Convex clustering is a well-regarded clustering method, resembling the similar centroid-based approach of Lloyd's $k$-means.<n>We propose a kernel embedding extension of the convex clustering method to transform data points into a Reproducing Kernel (RKHS)
arXiv Detail & Related papers (2025-11-07T11:24:22Z) - Subspace Clustering of Subspaces: Unifying Canonical Correlation Analysis and Subspace Clustering [26.936899997676264]
We introduce a novel framework for clustering a collection of tall matrices based on their column spaces.<n>Our formulation directly models each data sample as a matrix and clusters them according to their underlying subspaces.<n>Experiments on real-world hyperspectral imaging datasets demonstrate that our method achieves superior clustering accuracy and robustness.
arXiv Detail & Related papers (2025-09-23T05:12:40Z) - Kernel K-means clustering of distributional data [0.0]
We consider the problem of clustering a sample of probability distributions from a random distribution on $mathbb Rp$.<n>Our proposed partitioning method makes use of a symmetric, positive-definite kernel $k$ and its associated reproducing kernel Hilbert space $mathcal H$.<n>By mapping each distribution to its corresponding kernel mean embedding in $mathcal H$, we obtain a sample in this RKHS where we carry out the $K$-means clustering procedure.
arXiv Detail & Related papers (2025-09-22T17:11:29Z) - Scalable Context-Preserving Model-Aware Deep Clustering for Hyperspectral Images [51.95768218975529]
Subspace clustering has become widely adopted for the unsupervised analysis of hyperspectral images (HSIs)<n>Recent model-aware deep subspace clustering methods often use a two-stage framework, involving the calculation of a self-representation matrix with complexity of O(n2), followed by spectral clustering.<n>We propose a scalable, context-preserving deep clustering method based on basis representation, which jointly captures local and non-local structures for efficient HSI clustering.
arXiv Detail & Related papers (2025-06-12T16:43:09Z) - Near-Optimal Clustering in Mixture of Markov Chains [74.3828414695655]
We study the problem of clustering $T$ trajectories of length $H$, each generated by one of $K$ unknown ergodic Markov chains over a finite state space of size $S$.<n>We derive an instance-dependent, high-probability lower bound on the clustering error rate, governed by the weighted KL divergence between the transition kernels of the chains.<n>We then present a novel two-stage clustering algorithm.
arXiv Detail & Related papers (2025-06-02T05:10:40Z) - One-shot Robust Federated Learning of Independent Component Analysis [16.462282750354408]
We propose a geometric median-based aggregation algorithm that leverages $k$-means clustering to resolve the permutation ambiguity in local client estimations.<n>Our method first performs k-means to partition client-provided estimators into clusters and then aggregates estimators within each cluster using the geometric median.
arXiv Detail & Related papers (2025-05-26T21:37:19Z) - Radius-Guided Post-Clustering for Shape-Aware, Scalable Refinement of k-Means Results [1.9580473532948401]
After standard k-means, each cluster center is assigned a radius (the distance to its assigned point), and clusters whose radii overlap are merged.
This post-processing step loosens the requirement for exact k long as k is.
The method can often reconstruct non-estimated shapes over meaningful merges.
arXiv Detail & Related papers (2025-04-28T22:30:53Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - A Unified Framework for Center-based Clustering of Distributed Data [46.86543102499174]
We develop a family of distributed center-based clustering algorithms that work over networks of users.
Our framework allows for a broad class of smooth convex loss functions, including popular clustering losses like $K$-means and Huber loss.
For the special case of Bregman losses, we show that our fixed points converge to the set of Lloyd points.
arXiv Detail & Related papers (2024-02-02T10:44:42Z) - Revisiting Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [69.15976031704687]
We propose IAC (Instance-Adaptive Clustering), the first algorithm whose performance matches the instance-specific lower bounds both in expectation and with high probability.
IAC maintains an overall computational complexity of $ mathcalO(n, textpolylog(n) $, making it scalable and practical for large-scale problems.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Adaptive Graph Convolutional Subspace Clustering [10.766537212211217]
Spectral-type subspace clustering algorithms have shown excellent performance in many subspace clustering applications.
In this paper, inspired by graph convolutional networks, we use the graph convolution technique to develop a feature extraction method and a coefficient matrix constraint simultaneously.
We claim that by using AGCSC, the aggregated feature representation of original data samples is suitable for subspace clustering.
arXiv Detail & Related papers (2023-05-05T10:27:23Z) - Jacobian-Scaled K-means Clustering for Physics-Informed Segmentation of Reacting Flows [0.0]
JSK-means clustering is a physics-informed clustering strategy centered on the K-means framework.
The algorithm is demonstrated on a complex reacting flow simulation dataset.
arXiv Detail & Related papers (2023-05-02T15:47:18Z) - A One-shot Framework for Distributed Clustered Learning in Heterogeneous
Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments.
One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees.
For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z) - Perfect Spectral Clustering with Discrete Covariates [68.8204255655161]
We propose a spectral algorithm that achieves perfect clustering with high probability on a class of large, sparse networks.
Our method is the first to offer a guarantee of consistent latent structure recovery using spectral clustering.
arXiv Detail & Related papers (2022-05-17T01:41:06Z) - Gradient Based Clustering [72.15857783681658]
We propose a general approach for distance based clustering, using the gradient of the cost function that measures clustering quality.
The approach is an iterative two step procedure (alternating between cluster assignment and cluster center updates) and is applicable to a wide range of functions.
arXiv Detail & Related papers (2022-02-01T19:31:15Z) - Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning.
Recent work has established lower bounds against the class of low-degree methods.
We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z) - Distribution free optimality intervals for clustering [1.7513645771137178]
Given data $mathcalD$ and a partition $mathcalC$ of these data into $K$ clusters, when can we say that the clusters obtained are correct or meaningful for the data?
This paper introduces a paradigm in which a clustering $mathcalC$ is considered meaningful if it is good with respect to a loss function such as the K-means distortion, and stable, i.e. the only good clustering up to small perturbations.
arXiv Detail & Related papers (2021-07-30T06:13:56Z) - Weighted Sparse Subspace Representation: A Unified Framework for
Subspace Clustering, Constrained Clustering, and Active Learning [0.3553493344868413]
We first propose a novel spectral-based subspace clustering algorithm that seeks to represent each point as a sparse convex combination of a few nearby points.
We then extend the algorithm to constrained clustering and active learning settings.
Our motivation for developing such a framework stems from the fact that typically either a small amount of labelled data is available in advance; or it is possible to label some points at a cost.
arXiv Detail & Related papers (2021-06-08T13:39:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.