Almost-linear Time Approximation Algorithm to Euclidean $k$-median and $k$-means
- URL: http://arxiv.org/abs/2407.11217v1
- Date: Mon, 15 Jul 2024 20:04:06 GMT
- Title: Almost-linear Time Approximation Algorithm to Euclidean $k$-median and $k$-means
- Authors: Max Dupré la Tour, David Saulpic,
- Abstract summary: We focus on the Euclidean $k$-median and $k$-means problems, two of the standard ways to model the task of clustering.
In this paper, we almost answer this question by presenting an almost linear-time algorithm to compute a constant-factor approximation.
- Score: 4.271492285528115
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Clustering is one of the staples of data analysis and unsupervised learning. As such, clustering algorithms are often used on massive data sets, and they need to be extremely fast. We focus on the Euclidean $k$-median and $k$-means problems, two of the standard ways to model the task of clustering. For these, the go-to algorithm is $k$-means++, which yields an $O(\log k)$-approximation in time $\tilde O(nkd)$. While it is possible to improve either the approximation factor [Lattanzi and Sohler, ICML19] or the running time [Cohen-Addad et al., NeurIPS 20], it is unknown how precise a linear-time algorithm can be. In this paper, we almost answer this question by presenting an almost linear-time algorithm to compute a constant-factor approximation.
Related papers
- A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al.
A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$.
We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z) - Learning Arithmetic Formulas in the Presence of Noise: A General
Framework and Applications to Unsupervised Learning [4.10375234787249]
We present a framework for designing efficient algorithms for unsupervised learning problems.
Our framework is based on a meta algorithm that learns arithmetic circuits in the presence of noise.
arXiv Detail & Related papers (2023-11-13T12:26:25Z) - Simple, Scalable and Effective Clustering via One-Dimensional
Projections [10.807367640692021]
Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis.
We introduce a simple randomized clustering algorithm that provably runs in expected time $O(mathrmnnz(X) + nlog n)$ for arbitrary $k$.
We prove that our algorithm achieves approximation ratio $smashwidetildeO(k4)$ on any input dataset for the $k$-means objective.
arXiv Detail & Related papers (2023-10-25T16:37:45Z) - Do you know what q-means? [50.045011844765185]
Clustering is one of the most important tools for analysis of large datasets.
We present an improved version of the "$q$-means" algorithm for clustering.
We also present a "dequantized" algorithm for $varepsilon which runs in $Obig(frack2varepsilon2(sqrtkd + log(Nd))big.
arXiv Detail & Related papers (2023-08-18T17:52:12Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Near-Optimal Quantum Coreset Construction Algorithms for Clustering [15.513270929560088]
We give quantum algorithms that find coresets for $k$-clustering in $mathbbRd$ with $tildeO(sqrtnkd3/2)$ query complexity.
Our coreset reduces the input size from $n$ to $mathrmpoly(kepsilon-1d)$, so that existing $alpha$-approximation algorithms for clustering can run on top of it.
arXiv Detail & Related papers (2023-06-05T12:22:46Z) - Dual Algorithmic Reasoning [9.701208207491879]
We propose to learn algorithms by exploiting duality of the underlying algorithmic problem.
We demonstrate that simultaneously learning the dual definition of these optimisation problems in algorithmic learning allows for better learning.
We then validate the real-world utility of our dual algorithmic reasoner by deploying it on a challenging brain vessel classification task.
arXiv Detail & Related papers (2023-02-09T08:46:23Z) - Differentially-Private Hierarchical Clustering with Provable
Approximation Guarantees [79.59010418610625]
We study differentially private approximation algorithms for hierarchical clustering.
We show strong lower bounds for the problem: that any $epsilon$-DP algorithm must exhibit $O(|V|2/ epsilon)$-additive error for an input dataset.
We propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly.
arXiv Detail & Related papers (2023-01-31T19:14:30Z) - Private estimation algorithms for stochastic block models and mixture
models [63.07482515700984]
General tools for designing efficient private estimation algorithms.
First efficient $(epsilon, delta)$-differentially private algorithm for both weak recovery and exact recovery.
arXiv Detail & Related papers (2023-01-11T09:12:28Z) - A Faster $k$-means++ Algorithm [11.428775569173638]
We present a new algorithm that can solve the $k$-means++ problem with nearly optimal running time.
We propose a new algorithm textscFastKmeans++ that only takes in $widetildeO(nd + nk2)$ time, in total.
arXiv Detail & Related papers (2022-11-28T08:17:12Z) - Practical, Provably-Correct Interactive Learning in the Realizable
Setting: The Power of True Believers [12.09273192079783]
We consider interactive learning in the realizable setting and develop a general framework to handle problems ranging from best arm identification to active classification.
We design novel computationally efficient algorithms for the realizable setting that match the minimax lower bound up to logarithmic factors.
arXiv Detail & Related papers (2021-11-09T02:33:36Z) - Structure learning in polynomial time: Greedy algorithms, Bregman
information, and exponential families [12.936601424755944]
We study a general greedy score-based algorithm for learning DAGs.
We show how recent algorithm-time algorithms for learning DAG models are a special case of this algorithm.
This observation suggests new score functions and optimality conditions based on the duality between Bregman divergences and exponential families.
arXiv Detail & Related papers (2021-10-10T06:37:51Z) - Sublinear Time and Space Algorithms for Correlation Clustering via
Sparse-Dense Decompositions [9.29659220237395]
We present a new approach for solving (minimum disagreement) correlation clustering.
We obtain sublinear algorithms with highly efficient time and space complexity.
The main ingredient of our approach is a novel connection to sparse-dense graph decompositions.
arXiv Detail & Related papers (2021-09-29T16:25:02Z) - Algorithm Selection on a Meta Level [58.720142291102135]
We introduce the problem of meta algorithm selection, which essentially asks for the best way to combine a given set of algorithm selectors.
We present a general methodological framework for meta algorithm selection as well as several concrete learning methods as instantiations of this framework.
arXiv Detail & Related papers (2021-07-20T11:23:21Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z) - Hierarchical Agglomerative Graph Clustering in Nearly-Linear Time [1.5644420658691407]
We study the widely used hierarchical agglomerative clustering (HAC) algorithm on edge-weighted graphs.
We define an algorithmic framework for hierarchical agglomerative graph clustering.
We show that our approach can speed up clustering of point datasets by a factor of 20.7--76.5x.
arXiv Detail & Related papers (2021-06-10T09:29:05Z) - Provably Faster Algorithms for Bilevel Optimization [54.83583213812667]
Bilevel optimization has been widely applied in many important machine learning applications.
We propose two new algorithms for bilevel optimization.
We show that both algorithms achieve the complexity of $mathcalO(epsilon-1.5)$, which outperforms all existing algorithms by the order of magnitude.
arXiv Detail & Related papers (2021-06-08T21:05:30Z) - Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem.
We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z) - Identifying Co-Adaptation of Algorithmic and Implementational
Innovations in Deep Reinforcement Learning: A Taxonomy and Case Study of
Inference-based Algorithms [15.338931971492288]
We focus on a series of inference-based actor-critic algorithms to decouple their algorithmic innovations and implementation decisions.
We identify substantial performance drops whenever implementation details are mismatched for algorithmic choices.
Results show which implementation details are co-adapted and co-evolved with algorithms.
arXiv Detail & Related papers (2021-03-31T17:55:20Z) - Community detection using fast low-cardinality semidefinite programming [94.4878715085334]
We propose a new low-cardinality algorithm that generalizes the local update to maximize a semidefinite relaxation derived from Leiden-k-cut.
This proposed algorithm is scalable, outperforms state-of-the-art algorithms, and outperforms in real-world time with little additional cost.
arXiv Detail & Related papers (2020-12-04T15:46:30Z) - List-Decodable Mean Estimation in Nearly-PCA Time [50.79691056481693]
We study the fundamental task of list-decodable mean estimation in high dimensions.
Our algorithm runs in time $widetildeO(ndk)$ for all $k = O(sqrtd) cup Omega(d)$, where $n$ is the size of the dataset.
A variant of our algorithm has runtime $widetildeO(ndk)$ for all $k$, at the expense of an $O(sqrtlog k)$ factor in the recovery guarantee
arXiv Detail & Related papers (2020-11-19T17:21:37Z) - Efficient Computation of Expectations under Spanning Tree Distributions [67.71280539312536]
We propose unified algorithms for the important cases of first-order expectations and second-order expectations in edge-factored, non-projective spanning-tree models.
Our algorithms exploit a fundamental connection between gradients and expectations, which allows us to derive efficient algorithms.
arXiv Detail & Related papers (2020-08-29T14:58:26Z) - Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems.
Our results imply an improved algorithm for the Sample and Aggregate privacy framework.
One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.