A cutting plane algorithm for globally solving low dimensional k-means
clustering problems
- URL: http://arxiv.org/abs/2402.13595v1
- Date: Wed, 21 Feb 2024 07:55:33 GMT
- Title: A cutting plane algorithm for globally solving low dimensional k-means
clustering problems
- Authors: Martin Ryner, Jan Kronqvist, Johan Karlsson
- Abstract summary: We consider the k-means problem for instances with low dimensional data and formulate it as a structured concave assignment problem.
This allows us to exploit the low dimensional structure and solve the problem to global optimality within reasonable time.
The paper combines methods from global optimization theory to accelerate the procedure, and we provide numerical results.
- Score: 4.5594982923247995
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Clustering is one of the most fundamental tools in data science and machine
learning, and k-means clustering is one of the most common such methods. There
is a variety of approximate algorithms for the k-means problem, but computing
the globally optimal solution is in general NP-hard. In this paper we consider
the k-means problem for instances with low dimensional data and formulate it as
a structured concave assignment problem. This allows us to exploit the low
dimensional structure and solve the problem to global optimality within
reasonable time for large data sets with several clusters. The method builds on
iteratively solving a small concave problem and a large linear programming
problem. This gives a sequence of feasible solutions along with bounds which we
show converges to zero optimality gap. The paper combines methods from global
optimization theory to accelerate the procedure, and we provide numerical
results on their performance.
Related papers
- From Large to Small Datasets: Size Generalization for Clustering
Algorithm Selection [12.993073967843292]
We study a problem in a semi-supervised setting, with an unknown ground-truth clustering.
We introduce a notion of size generalization for clustering algorithm accuracy.
We use a subsample of as little as 5% of the data to identify which algorithm is best on the full dataset.
arXiv Detail & Related papers (2024-02-22T06:53:35Z) - A Novel Normalized-Cut Solver with Nearest Neighbor Hierarchical
Initialization [107.07093621337084]
Normalized-Cut (N-Cut) is a famous model of spectral clustering.
Traditional N-Cut solvers are two-stage: 1) calculating the continuous spectral embedding of normalized Laplacian matrix; 2) discretization via $K$-means or spectral rotation.
We propose a novel N-Cut solver based on the famous coordinate descent method.
arXiv Detail & Related papers (2023-11-26T07:11:58Z) - Linear time Evidence Accumulation Clustering with KMeans [0.0]
This work describes a trick which mimic the behavior of average linkage clustering.
We found a way of computing efficiently the density of a partitioning, reducing the cost from a quadratic to linear complexity.
The k-means results are comparable to the best state of the art in terms of NMI while keeping the computational cost low.
arXiv Detail & Related papers (2023-11-15T14:12:59Z) - Accelerating Cutting-Plane Algorithms via Reinforcement Learning
Surrogates [49.84541884653309]
A current standard approach to solving convex discrete optimization problems is the use of cutting-plane algorithms.
Despite the existence of a number of general-purpose cut-generating algorithms, large-scale discrete optimization problems continue to suffer from intractability.
We propose a method for accelerating cutting-plane algorithms via reinforcement learning.
arXiv Detail & Related papers (2023-07-17T20:11:56Z) - Approximating a RUM from Distributions on k-Slates [88.32814292632675]
We find a generalization-time algorithm that finds the RUM that best approximates the given distribution on average.
Our theoretical result can also be made practical: we obtain a that is effective and scales to real-world datasets.
arXiv Detail & Related papers (2023-05-22T17:43:34Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Randomized Greedy Algorithms and Composable Coreset for k-Center
Clustering with Outliers [11.546734084378683]
The presence of outliers can significantly increase the computational complexity.
Our idea is inspired by the greedy method, that was developed for solving the ordinary $k$-center clustering problem.
arXiv Detail & Related papers (2023-01-07T09:26:01Z) - On the Global Solution of Soft k-Means [159.23423824953412]
This paper presents an algorithm to solve the Soft k-Means problem globally.
A new model, named Minimal Volume Soft kMeans (MVSkM), is proposed to address solutions non-uniqueness issue.
arXiv Detail & Related papers (2022-12-07T12:06:55Z) - How to Use K-means for Big Data Clustering? [2.1165011830664677]
K-means is the simplest and most widely used algorithm under the Euclidean Minimum Sum-of-Squares Clustering (MSSC) model.
We propose a new parallel scheme of using K-means and K-means++ algorithms for big data clustering.
arXiv Detail & Related papers (2022-04-14T08:18:01Z) - Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems.
Our results imply an improved algorithm for the Sample and Aggregate privacy framework.
One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z) - An Efficient Smoothing Proximal Gradient Algorithm for Convex Clustering [2.5182813818441945]
Recently introduced convex clustering approach formulates clustering as a convex optimization problem.
State-of-the-art convex clustering algorithms require large computation and memory space.
In this paper, we develop a very efficient smoothing gradient algorithm (Sproga) for convex clustering.
arXiv Detail & Related papers (2020-06-22T20:02:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.