Scalable Algorithms for Individual Preference Stable Clustering
- URL: http://arxiv.org/abs/2403.10365v1
- Date: Fri, 15 Mar 2024 14:58:27 GMT
- Title: Scalable Algorithms for Individual Preference Stable Clustering
- Authors: Ron Mosenzon, Ali Vakilian,
- Abstract summary: In this paper, we study the natural local search algorithm for IP stable clustering.
Our analysis confirms a $O(log n)$-IP stability guarantee for this algorithm, where $n$ denotes the number of points in the input.
- Score: 8.01184332330228
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study the individual preference (IP) stability, which is an notion capturing individual fairness and stability in clustering. Within this setting, a clustering is $\alpha$-IP stable when each data point's average distance to its cluster is no more than $\alpha$ times its average distance to any other cluster. In this paper, we study the natural local search algorithm for IP stable clustering. Our analysis confirms a $O(\log n)$-IP stability guarantee for this algorithm, where $n$ denotes the number of points in the input. Furthermore, by refining the local search approach, we show it runs in an almost linear time, $\tilde{O}(nk)$.
Related papers
- A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al.
A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$.
We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z) - Constant Approximation for Individual Preference Stable Clustering [26.716316367717585]
Individual preference (IP) stability is a natural clustering objective inspired by stability and fairness constraints.
It was unknown if an $o(n)$-IP stable clustering always emphexists, as the prior state of the art only guaranteed an $O(n)$-IP stable clustering.
We show that an $O(1)$-IP stable clustering always exists for general metrics, and we give an efficient algorithm which outputs such a clustering.
arXiv Detail & Related papers (2023-09-28T20:42:46Z) - Do you know what q-means? [50.045011844765185]
Clustering is one of the most important tools for analysis of large datasets.
We present an improved version of the "$q$-means" algorithm for clustering.
We also present a "dequantized" algorithm for $varepsilon which runs in $Obig(frack2varepsilon2(sqrtkd + log(Nd))big.
arXiv Detail & Related papers (2023-08-18T17:52:12Z) - Dependent Cluster Mapping (DCMAP): Optimal clustering of directed
acyclic graphs for statistical inference [0.0]
A Directed Acyclic Graph (DAG) can be partitioned or mapped into clusters to support inference.
We propose a novel algorithm called DCMAP for optimal cluster mapping with dependent clusters.
For a 25 and 50-node DBN, the search space size was $9.91times 109$ and $1.51times1021$ possible cluster mappings, and the first optimal solution was found at 934 $(text95% CI 926,971)$.
arXiv Detail & Related papers (2023-08-08T01:01:37Z) - Approximate spectral clustering with eigenvector selection and
self-tuned $k$ [1.827510863075184]
Recently emerged spectral clustering surpasses conventional clustering methods by detecting clusters of any shape without the convexity assumption.
In practice, manual setting of $k$ could be subjective or time consuming.
The proposed algorithm has two relevance metrics for estimating $k$ in two vital steps of ASC.
The experimental setup demonstrates the efficiency of the proposed algorithm and its ability to compete with similar methods where $k$ was set manually.
arXiv Detail & Related papers (2023-02-22T11:32:24Z) - Improved Learning-augmented Algorithms for k-means and k-medians
Clustering [8.04779839951237]
We consider the problem of clustering in the learning-augmented setting, where we are given a data set in $d$-dimensional Euclidean space.
We propose a deterministic $k$-means algorithm that produces centers with improved bound on clustering cost.
Our algorithm works even when the predictions are not very accurate, i.e. our bound holds for $alpha$ up to $1/2$, an improvement over $alpha$ being at most $1/7$ in the previous work.
arXiv Detail & Related papers (2022-10-31T03:00:11Z) - Individual Preference Stability for Clustering [24.301650646957246]
We propose a natural notion of individual preference (IP) stability for clustering.
Our notion asks that every data point, on average, is closer to the points in its own cluster than to the points in any other cluster.
arXiv Detail & Related papers (2022-07-07T22:01:01Z) - Scalable Differentially Private Clustering via Hierarchically Separated
Trees [82.69664595378869]
We show that our method computes a solution with cost at most $O(d3/2log n)cdot OPT + O(k d2 log2 n / epsilon2)$, where $epsilon$ is the privacy guarantee.
Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical.
arXiv Detail & Related papers (2022-06-17T09:24:41Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z) - Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem.
We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z) - Stable and consistent density-based clustering via multiparameter
persistence [77.34726150561087]
We consider the degree-Rips construction from topological data analysis.
We analyze its stability to perturbations of the input data using the correspondence-interleaving distance.
We integrate these methods into a pipeline for density-based clustering, which we call Persistable.
arXiv Detail & Related papers (2020-05-18T19:45:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.