A Geometric Approach to $k$-means
- URL: http://arxiv.org/abs/2201.04822v2
- Date: Mon, 22 Sep 2025 21:01:07 GMT
- Title: A Geometric Approach to $k$-means
- Authors: Jiazhen Hong, Wei Qian, Yudong Chen, Yuqian Zhang,
- Abstract summary: kmeans clustering is a fundamental problem in many engineering domains.<n>We propose a general framework for escaping undesirable two solutions and recovering the solution or the ground clustering.
- Score: 12.972775894401943
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: \kmeans clustering is a fundamental problem in many scientific and engineering domains. The optimization problem associated with \kmeans clustering is nonconvex, for which standard algorithms are only guaranteed to find a local optimum. Leveraging the hidden structure of local solutions, we propose a general algorithmic framework for escaping undesirable local solutions and recovering the global solution or the ground truth clustering. This framework consists of iteratively alternating between two steps: (i) detect mis-specified clusters in a local solution, and (ii) improve the local solution by non-local operations. We discuss specific implementation of these steps, and elucidate how the proposed framework unifies many existing variants of \kmeans algorithms through a geometric perspective. We also present two natural variants of the proposed framework, where the initial number of clusters may be over- or under-specified. We provide theoretical justifications and extensive experiments to demonstrate the efficacy of the proposed approach.
Related papers
- Approximation Algorithm for Constrained $k$-Center Clustering: A Local Search Approach [10.257446766550862]
We study the constrained k-center clustering problem, where instance-level cannot-link (CL) and must-link (ML) constraints are incorporated as background knowledge.<n>We propose a novel local search framework based on a transformation to a dominating matching set problem, achieving the best possible approximation ratio of 2.
arXiv Detail & Related papers (2026-01-17T02:39:41Z) - Modified K-means Algorithm with Local Optimality Guarantees [10.936166435599574]
The K-means algorithm is one of the most widely studied clustering algorithms in machine learning.<n>In this paper, we present conditions under which the K-means algorithm converges to a locally optimal solution.<n>We propose simple modifications to the K-means algorithm which ensure local optimality in both the continuous and discrete sense.
arXiv Detail & Related papers (2025-06-08T04:37:28Z) - SPARKLE: A Unified Single-Loop Primal-Dual Framework for Decentralized Bilevel Optimization [35.92829763686735]
This paper studies decentralized bilevel optimization, in which multiple agents collaborate to solve problems involving nested optimization structures with neighborhood communications.
We propose SPARKLE, a unified Single-loop Primal-dual AlgoRithm frameworK for decentraLized bilEvel optimization.
We present a unified convergence analysis for SPARKLE, applicable to all its variants, with state-of-the-art convergence rates compared to existing decentralized bilevel algorithms.
arXiv Detail & Related papers (2024-11-21T14:23:06Z) - A machine learning framework for neighbor generation in metaheuristic
search [4.521119623956821]
We propose a general machine learning framework for neighbor generation in metaheuristic search.
We validate our methodology on two metaheuristic applications.
arXiv Detail & Related papers (2022-12-22T01:58:04Z) - A One-shot Framework for Distributed Clustered Learning in Heterogeneous
Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments.
One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees.
For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z) - ARES: An Efficient Algorithm with Recurrent Evaluation and Sampling-Driven Inference for Maximum Independent Set [48.57120672468062]
This paper introduces an efficient algorithm for the Maximum Independent Set (MIS) problem, incorporating two innovative techniques.
The proposed algorithm outperforms state-of-the-art algorithms in terms of solution quality, computational efficiency, and stability.
arXiv Detail & Related papers (2022-08-16T14:39:38Z) - The Dynamics of Riemannian Robbins-Monro Algorithms [101.29301565229265]
We propose a family of Riemannian algorithms generalizing and extending the seminal approximation framework of Robbins and Monro.
Compared to their Euclidean counterparts, Riemannian algorithms are much less understood due to lack of a global linear structure on the manifold.
We provide a general template of almost sure convergence results that mirrors and extends the existing theory for Euclidean Robbins-Monro schemes.
arXiv Detail & Related papers (2022-06-14T12:30:11Z) - Gradient Based Clustering [72.15857783681658]
We propose a general approach for distance based clustering, using the gradient of the cost function that measures clustering quality.
The approach is an iterative two step procedure (alternating between cluster assignment and cluster center updates) and is applicable to a wide range of functions.
arXiv Detail & Related papers (2022-02-01T19:31:15Z) - Learning Proximal Operators to Discover Multiple Optima [66.98045013486794]
We present an end-to-end method to learn the proximal operator across non-family problems.
We show that for weakly-ized objectives and under mild conditions, the method converges globally.
arXiv Detail & Related papers (2022-01-28T05:53:28Z) - Distributed and Stochastic Optimization Methods with Gradient
Compression and Local Steps [0.0]
We propose theoretical frameworks for the analysis and distributed methods with error compensation and local updates.
We develop more than 20 new optimization methods, including the first linearly converging Error-pensated and first distributed Local-SGD methods.
arXiv Detail & Related papers (2021-12-20T16:12:54Z) - Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex
Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network.
We design two optimal algorithms that attain these lower bounds.
We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z) - Community detection using fast low-cardinality semidefinite programming [94.4878715085334]
We propose a new low-cardinality algorithm that generalizes the local update to maximize a semidefinite relaxation derived from Leiden-k-cut.
This proposed algorithm is scalable, outperforms state-of-the-art algorithms, and outperforms in real-world time with little additional cost.
arXiv Detail & Related papers (2020-12-04T15:46:30Z) - IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method [64.15649345392822]
We introduce a framework for designing primal methods under the decentralized optimization setting where local functions are smooth and strongly convex.
Our approach consists of approximately solving a sequence of sub-problems induced by the accelerated augmented Lagrangian method.
When coupled with accelerated gradient descent, our framework yields a novel primal algorithm whose convergence rate is optimal and matched by recently derived lower bounds.
arXiv Detail & Related papers (2020-06-11T18:49:06Z) - From Local SGD to Local Fixed-Point Methods for Federated Learning [17.04886864943171]
We consider the generic problem of finding a fixed point of an average of operators, or an approximation thereof, in a distributed setting.
We investigate two strategies to achieve such a consensus: one based on a fixed number of local steps, and the other based on randomized computations.
arXiv Detail & Related papers (2020-04-03T09:24:10Z) - Second-Order Guarantees in Centralized, Federated and Decentralized
Nonconvex Optimization [64.26238893241322]
Simple algorithms have been shown to lead to good empirical results in many contexts.
Several works have pursued rigorous analytical justification for studying non optimization problems.
A key insight in these analyses is that perturbations play a critical role in allowing local descent algorithms.
arXiv Detail & Related papers (2020-03-31T16:54:22Z) - Structures of Spurious Local Minima in $k$-means [20.155509538529568]
We investigate the structures of spurious local solutions to the $k$-means problem.
We prove that this is essentially the only type of spurious local minima under a separation condition.
arXiv Detail & Related papers (2020-02-16T22:15:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.