Related papers: A Constant Approximation Algorithm for Sequential No-Substitution k-Median Clustering under a Random Arrival Order

A Constant Approximation Algorithm for Sequential No-Substitution k-Median Clustering under a Random Arrival Order

URL: http://arxiv.org/abs/2102.04050v1
Date: Mon, 8 Feb 2021 08:25:29 GMT
Title: A Constant Approximation Algorithm for Sequential No-Substitution k-Median Clustering under a Random Arrival Order
Authors: Tom Hess, Michal Moshkovitz and Sivan Sabato
Abstract summary: We study k-median clustering under the sequential no-substitution setting. In this setting, a data stream is sequentially observed, and some of the points are selected by the algorithm as cluster centers. We give a new algorithm for this setting that obtains a constant approximation factor on the optimal risk under a random arrival order.
Score: 24.304228393096395
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study k-median clustering under the sequential no-substitution setting. In this setting, a data stream is sequentially observed, and some of the points are selected by the algorithm as cluster centers. However, a point can be selected as a center only immediately after it is observed, before observing the next point. In addition, a selected center cannot be substituted later. We give a new algorithm for this setting that obtains a constant approximation factor on the optimal risk under a random arrival order. This is the first such algorithm that holds without any assumptions on the input data and selects a non-trivial number of centers. The number of selected centers is quasi-linear in k. Our algorithm and analysis are based on a careful risk estimation that avoids outliers, a new concept of a linear bin division, and repeated calculations using an offline clustering algorithm.

Related papers

Can SGD Select Good Fishermen? Local Convergence under Self-Selection Biases and Beyond [11.884593048693507]
We revisit the problem of estimating $k$ linear regressors with selfselection bias in $doperator dimensions. Our main result is a $namepoly(d,k,1/varepsilon) + kO(k)$ time algorithm for this problem.
arXiv Detail & Related papers (2025-04-06T20:59:12Z)
Fuzzy K-Means Clustering without Cluster Centroids [21.256564324236333]
Fuzzy K-Means clustering is a critical technique in unsupervised data analysis. This paper proposes a novel Fuzzy textitK-Means clustering algorithm that entirely eliminates the reliance on cluster centroids.
arXiv Detail & Related papers (2024-04-07T12:25:03Z)
Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation. Specifically, we construct distance matrix between data points by Butterworth filter. To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z)
An enhanced method of initial cluster center selection for K-means algorithm [0.0]
We propose a novel approach to improve initial cluster selection for K-means algorithm. The Convex Hull algorithm facilitates the computing of the first two centroids and the remaining ones are selected according to the distance from previously selected centers. We obtained only 7.33%, 7.90%, and 0% clustering error in Iris, Letter, and Ruspini data respectively.
arXiv Detail & Related papers (2022-10-18T00:58:50Z)
An Improved Greedy Algorithm for Subset Selection in Linear Estimation [5.994412766684842]
We consider a subset selection problem in a spatial field where we seek to find a set of k locations whose observations provide the best estimate of the field value at a finite set of prediction locations. One approach for observation selection is to perform a grid discretization of the space and obtain an approximate solution using the greedy algorithm. We propose a method to reduce the computational complexity by considering a search space consisting only of prediction locations and centroids of cliques formed by the prediction locations.
arXiv Detail & Related papers (2022-03-30T05:52:16Z)
Gradient Based Clustering [72.15857783681658]
We propose a general approach for distance based clustering, using the gradient of the cost function that measures clustering quality. The approach is an iterative two step procedure (alternating between cluster assignment and cluster center updates) and is applicable to a wide range of functions.
arXiv Detail & Related papers (2022-02-01T19:31:15Z)
Determinantal consensus clustering [77.34726150561087]
We propose the use of determinantal point processes or DPP for the random restart of clustering algorithms. DPPs favor diversity of the center points within subsets. We show through simulations that, contrary to DPP, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets.
arXiv Detail & Related papers (2021-02-07T23:48:24Z)
Relational Algorithms for k-means Clustering [17.552485682328772]
This paper gives a k-means approximation algorithm that is efficient in the relational algorithms model. The running time is potentially exponentially smaller than $N$, the number of data points to be clustered that the relational database represents.
arXiv Detail & Related papers (2020-08-01T23:21:40Z)
Stochastic Saddle-Point Optimization for Wasserstein Barycenters [69.68068088508505]
We consider the populationimation barycenter problem for random probability measures supported on a finite set of points and generated by an online stream of data. We employ the structure of the problem and obtain a convex-concave saddle-point reformulation of this problem. In the setting when the distribution of random probability measures is discrete, we propose an optimization algorithm and estimate its complexity.
arXiv Detail & Related papers (2020-06-11T19:40:38Z)
Ball k-means [53.89505717006118]
The Ball k-means algorithm uses a ball to describe a cluster, focusing on reducing the point-centroid distance computation. The fast speed, no extra parameters and simple design of the Ball k-means make it an all-around replacement of the naive k-means algorithm.
arXiv Detail & Related papers (2020-05-02T10:39:26Z)
Second-Order Guarantees in Centralized, Federated and Decentralized Nonconvex Optimization [64.26238893241322]
Simple algorithms have been shown to lead to good empirical results in many contexts. Several works have pursued rigorous analytical justification for studying non optimization problems. A key insight in these analyses is that perturbations play a critical role in allowing local descent algorithms.
arXiv Detail & Related papers (2020-03-31T16:54:22Z)
A novel initialisation based on hospital-resident assignment for the k-modes algorithm [0.0]
This paper presents a new way of selecting an initial solution for the k-modes algorithm. It allows for a notion of mathematical fairness and a leverage of the data that the common initialisations from literature do not.
arXiv Detail & Related papers (2020-02-07T10:20:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.