Related papers: Scalable Algorithms for Individual Preference Stable Clustering

Scalable Algorithms for Individual Preference Stable Clustering

URL: http://arxiv.org/abs/2403.10365v1
Date: Fri, 15 Mar 2024 14:58:27 GMT
Title: Scalable Algorithms for Individual Preference Stable Clustering
Authors: Ron Mosenzon, Ali Vakilian,
Abstract summary: In this paper, we study the natural local search algorithm for IP stable clustering. Our analysis confirms a $O(log n)$-IP stability guarantee for this algorithm, where $n$ denotes the number of points in the input.
Score: 8.01184332330228
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we study the individual preference (IP) stability, which is an notion capturing individual fairness and stability in clustering. Within this setting, a clustering is $\alpha$-IP stable when each data point's average distance to its cluster is no more than $\alpha$ times its average distance to any other cluster. In this paper, we study the natural local search algorithm for IP stable clustering. Our analysis confirms a $O(\log n)$-IP stability guarantee for this algorithm, where $n$ denotes the number of points in the input. Furthermore, by refining the local search approach, we show it runs in an almost linear time, $\tilde{O}(nk)$.

Related papers

Almost-linear Time Approximation Algorithm to Euclidean $k$-median and $k$-means [4.271492285528115]
We focus on the Euclidean $k$-median and $k$-means problems, two of the standard ways to model the task of clustering. In this paper, we almost answer this question by presenting an almost linear-time algorithm to compute a constant-factor approximation.
arXiv Detail & Related papers (2024-07-15T20:04:06Z)
A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al. A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$. We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z)
Constant Approximation for Individual Preference Stable Clustering [26.716316367717585]
Individual preference (IP) stability is a natural clustering objective inspired by stability and fairness constraints. It was unknown if an $o(n)$-IP stable clustering always emphexists, as the prior state of the art only guaranteed an $O(n)$-IP stable clustering. We show that an $O(1)$-IP stable clustering always exists for general metrics, and we give an efficient algorithm which outputs such a clustering.
arXiv Detail & Related papers (2023-09-28T20:42:46Z)
Do you know what q-means? [50.045011844765185]
Clustering is one of the most important tools for analysis of large datasets. We present an improved version of the "$q$-means" algorithm for clustering. We also present a "dequantized" algorithm for $varepsilon which runs in $Obig(frack2varepsilon2(sqrtkd + log(Nd))big.
arXiv Detail & Related papers (2023-08-18T17:52:12Z)
Dependent Cluster Mapping (DCMAP): Optimal clustering of directed acyclic graphs for statistical inference [0.0]
A Directed Acyclic Graph (DAG) can be partitioned or mapped into clusters to support inference. We propose a novel algorithm called DCMAP for optimal cluster mapping with dependent clusters. For a 25 and 50-node DBN, the search space size was $9.91times 109$ and $1.51times1021$ possible cluster mappings, and the first optimal solution was found at 934 $(text95% CI 926,971)$.
arXiv Detail & Related papers (2023-08-08T01:01:37Z)
Differentially Private Clustering in Data Streams [65.78882209673885]
We present a differentially private streaming clustering framework which only requires an offline DP coreset or clustering algorithm as a blackbox. Our framework is also differentially private under the continual release setting, i.e., the union of outputs of our algorithms at every timestamp is always differentially private.
arXiv Detail & Related papers (2023-07-14T16:11:22Z)
Approximate spectral clustering with eigenvector selection and self-tuned $k$ [1.827510863075184]
Recently emerged spectral clustering surpasses conventional clustering methods by detecting clusters of any shape without the convexity assumption. In practice, manual setting of $k$ could be subjective or time consuming. The proposed algorithm has two relevance metrics for estimating $k$ in two vital steps of ASC. The experimental setup demonstrates the efficiency of the proposed algorithm and its ability to compete with similar methods where $k$ was set manually.
arXiv Detail & Related papers (2023-02-22T11:32:24Z)
Improved Learning-augmented Algorithms for k-means and k-medians Clustering [8.04779839951237]
We consider the problem of clustering in the learning-augmented setting, where we are given a data set in $d$-dimensional Euclidean space. We propose a deterministic $k$-means algorithm that produces centers with improved bound on clustering cost. Our algorithm works even when the predictions are not very accurate, i.e. our bound holds for $alpha$ up to $1/2$, an improvement over $alpha$ being at most $1/7$ in the previous work.
arXiv Detail & Related papers (2022-10-31T03:00:11Z)
Individual Preference Stability for Clustering [24.301650646957246]
We propose a natural notion of individual preference (IP) stability for clustering. Our notion asks that every data point, on average, is closer to the points in its own cluster than to the points in any other cluster.
arXiv Detail & Related papers (2022-07-07T22:01:01Z)
Scalable Differentially Private Clustering via Hierarchically Separated Trees [82.69664595378869]
We show that our method computes a solution with cost at most $O(d3/2log n)cdot OPT + O(k d2 log2 n / epsilon2)$, where $epsilon$ is the privacy guarantee. Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical.
arXiv Detail & Related papers (2022-06-17T09:24:41Z)
Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset. We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z)
Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem. We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z)
Stable and consistent density-based clustering via multiparameter persistence [77.34726150561087]
We consider the degree-Rips construction from topological data analysis. We analyze its stability to perturbations of the input data using the correspondence-interleaving distance. We integrate these methods into a pipeline for density-based clustering, which we call Persistable.
arXiv Detail & Related papers (2020-05-18T19:45:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.