Consistency of Anchor-based Spectral Clustering
- URL: http://arxiv.org/abs/2006.13984v2
- Date: Sat, 27 Jun 2020 12:27:00 GMT
- Title: Consistency of Anchor-based Spectral Clustering
- Authors: Henry-Louis de Kergorlay, Desmond John Higham
- Abstract summary: Anchor-based techniques reduce the computational complexity of spectral clustering algorithms.
We show that it is amenable to rigorous analysis, as well as being effective in practice.
We find that it is competitive with the state-of-the-art LSC method of Chen and Cai.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Anchor-based techniques reduce the computational complexity of spectral
clustering algorithms. Although empirical tests have shown promising results,
there is currently a lack of theoretical support for the anchoring approach. We
define a specific anchor-based algorithm and show that it is amenable to
rigorous analysis, as well as being effective in practice. We establish the
theoretical consistency of the method in an asymptotic setting where data is
sampled from an underlying continuous probability distribution. In particular,
we provide sharp asymptotic conditions for the algorithm parameters which
ensure that the anchor-based method can recover with high probability disjoint
clusters that are mutually separated by a positive distance. We illustrate the
performance of the algorithm on synthetic data and explain how the theoretical
convergence analysis can be used to inform the practical choice of parameter
scalings. We also test the accuracy and efficiency of the algorithm on two
large scale real data sets. We find that the algorithm offers clear advantages
over standard spectral clustering. We also find that it is competitive with the
state-of-the-art LSC method of Chen and Cai (Twenty-Fifth AAAI Conference on
Artificial Intelligence, 2011), while having the added benefit of a consistency
guarantee.
Related papers
- Quantized Hierarchical Federated Learning: A Robust Approach to
Statistical Heterogeneity [3.8798345704175534]
We present a novel hierarchical federated learning algorithm that incorporates quantization for communication-efficiency.
We offer a comprehensive analytical framework to evaluate its optimality gap and convergence rate.
Our findings reveal that our algorithm consistently achieves high learning accuracy over a range of parameters.
arXiv Detail & Related papers (2024-03-03T15:40:24Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Adversarially robust clustering with optimality guarantees [7.0830450321883935]
We consider the problem of clustering data points coming from sub-Gaussian mixtures.
Existing methods that provably achieve the optimal mislabeling error, such as the Lloyd algorithm, are usually vulnerable to outliers.
We propose a simple robust algorithm based on the coordinatewise median that obtains the optimal mislabeling rate even when we allow adversarial outliers to be present.
arXiv Detail & Related papers (2023-06-16T17:17:07Z) - Exploring the Algorithm-Dependent Generalization of AUPRC Optimization
with List Stability [107.65337427333064]
optimization of the Area Under the Precision-Recall Curve (AUPRC) is a crucial problem for machine learning.
In this work, we present the first trial in the single-dependent generalization of AUPRC optimization.
Experiments on three image retrieval datasets on speak to the effectiveness and soundness of our framework.
arXiv Detail & Related papers (2022-09-27T09:06:37Z) - Perfect Spectral Clustering with Discrete Covariates [68.8204255655161]
We propose a spectral algorithm that achieves perfect clustering with high probability on a class of large, sparse networks.
Our method is the first to offer a guarantee of consistent latent structure recovery using spectral clustering.
arXiv Detail & Related papers (2022-05-17T01:41:06Z) - Adaptive Resonance Theory-based Topological Clustering with a Divisive
Hierarchical Structure Capable of Continual Learning [8.581682204722894]
This paper proposes an ART-based topological clustering algorithm with a mechanism that automatically estimates a similarity threshold from a distribution of data points.
For the improving information extraction performance, a divisive hierarchical clustering algorithm capable of continual learning is proposed.
arXiv Detail & Related papers (2022-01-26T02:34:52Z) - An iterative clustering algorithm for the Contextual Stochastic Block
Model with optimality guarantees [4.007017852999008]
We propose a new iterative algorithm to cluster networks with side information for nodes.
We show that our algorithm is optimal under the Contextual Symmetric Block Model.
arXiv Detail & Related papers (2021-12-20T12:04:07Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - A Distributional Analysis of Sampling-Based Reinforcement Learning
Algorithms [67.67377846416106]
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes.
We show that value-based methods such as TD($lambda$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions.
arXiv Detail & Related papers (2020-03-27T05:13:29Z) - Simple and Scalable Sparse k-means Clustering via Feature Ranking [14.839931533868176]
We propose a novel framework for sparse k-means clustering that is intuitive, simple to implement, and competitive with state-of-the-art algorithms.
Our core method readily generalizes to several task-specific algorithms such as clustering on subsets of attributes and in partially observed data settings.
arXiv Detail & Related papers (2020-02-20T02:41:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.