Consistent $k$-Median: Simpler, Better and Robust
- URL: http://arxiv.org/abs/2008.06101v1
- Date: Thu, 13 Aug 2020 20:24:28 GMT
- Title: Consistent $k$-Median: Simpler, Better and Robust
- Authors: Xiangyu Guo, Janardhan Kulkarni, Shi Li, Jiayi Xian
- Abstract summary: We show that a simple local-search based online algorithm can give a bicriteria constant approximation for the problem with $O(k2 log2 (nD))$ swaps of medians (recourse) in total.
When restricted to the problem without outliers, our algorithm is simpler, deterministic and gives better approximation ratio and recourse.
- Score: 20.692372082600972
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we introduce and study the online consistent $k$-clustering
with outliers problem, generalizing the non-outlier version of the problem
studied in [Lattanzi-Vassilvitskii, ICML17].
We show that a simple local-search based online algorithm can give a
bicriteria constant approximation for the problem with $O(k^2 \log^2 (nD))$
swaps of medians (recourse) in total, where $D$ is the diameter of the metric.
When restricted to the problem without outliers, our algorithm is simpler,
deterministic and gives better approximation ratio and recourse, compared to
that of [Lattanzi-Vassilvitskii, ICML17].
Related papers
- Relax and Merge: A Simple Yet Effective Framework for Solving Fair $k$-Means and $k$-sparse Wasserstein Barycenter Problems [8.74967598360817]
Given a dataset comprising several groups, the fairness constraint requires that each cluster should contain a proportion of points from each group.
We propose a novel Relax and Merge'' framework, where $rho$ is the approximate ratio of an off-the-shelf vanilla $k$-means algorithm.
If equipped with a PTAS of $k$-means, our solution can achieve an approximation ratio of $(5+O(epsilon))$ with only a slight violation of the fairness constraints.
arXiv Detail & Related papers (2024-11-02T02:50:12Z) - Approximate Algorithms For $k$-Sparse Wasserstein Barycenter With Outliers [10.259254824702555]
We study the $k$-sparse Wasserstein Barycenter problem in the presence of outliers.
Existing WB algorithms cannot be directly extended to handle the case with outliers.
We propose a clustering based LP method that yields constant approximation factor for the $k$-sparse WB with outliers problem.
arXiv Detail & Related papers (2024-04-20T15:01:35Z) - Matching the Statistical Query Lower Bound for k-sparse Parity Problems with Stochastic Gradient Descent [83.85536329832722]
We show that gradient descent (SGD) can efficiently solve the $k$-parity problem on a $d$dimensional hypercube.
We then demonstrate how a trained neural network with SGD, solving the $k$-parity problem with small statistical errors.
arXiv Detail & Related papers (2024-04-18T17:57:53Z) - A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al.
A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$.
We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z) - Revisiting Rotation Averaging: Uncertainties and Robust Losses [51.64986160468128]
We argue that the main problem of current methods is the minimized cost function that is only weakly connected with the input data via the estimated epipolar.
We propose to better model the underlying noise distributions by directly propagating the uncertainty from the point correspondences into the rotation averaging.
arXiv Detail & Related papers (2023-03-09T11:51:20Z) - Randomized Greedy Algorithms and Composable Coreset for k-Center
Clustering with Outliers [11.546734084378683]
The presence of outliers can significantly increase the computational complexity.
Our idea is inspired by the greedy method, that was developed for solving the ordinary $k$-center clustering problem.
arXiv Detail & Related papers (2023-01-07T09:26:01Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Scalable Differentially Private Clustering via Hierarchically Separated
Trees [82.69664595378869]
We show that our method computes a solution with cost at most $O(d3/2log n)cdot OPT + O(k d2 log2 n / epsilon2)$, where $epsilon$ is the privacy guarantee.
Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical.
arXiv Detail & Related papers (2022-06-17T09:24:41Z) - Distributed k-Means with Outliers in General Metrics [0.6117371161379208]
We present a distributed coreset-based 3-round approximation algorithm for k-means with $z$ outliers for general metric spaces.
An important feature of our algorithm is that it obliviously adapts to the intrinsic complexity of the dataset, captured by the dimension doubling $D$ of the metric space.
arXiv Detail & Related papers (2022-02-16T16:24:31Z) - Minimax Optimal Algorithms with Fixed-$k$-Nearest Neighbors [13.231906521852718]
We consider a distributed learning scenario in which a massive dataset is split into smaller groups.
We propose emphoptimal rules to aggregate the fixed-$k$-NN information for classification, regression, and density estimation.
We show that the distributed algorithm with a fixed $k$ over a sufficiently large number of groups attains a minimax optimal error rate up to a multiplicative logarithmic factor.
arXiv Detail & Related papers (2022-02-05T01:59:09Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.