Constant Approximation for Individual Preference Stable Clustering
- URL: http://arxiv.org/abs/2309.16840v1
- Date: Thu, 28 Sep 2023 20:42:46 GMT
- Title: Constant Approximation for Individual Preference Stable Clustering
- Authors: Anders Aamand, Justin Y. Chen, Allen Liu, Sandeep Silwal, Pattara
Sukprasert, Ali Vakilian, Fred Zhang
- Abstract summary: Individual preference (IP) stability is a natural clustering objective inspired by stability and fairness constraints.
It was unknown if an $o(n)$-IP stable clustering always emphexists, as the prior state of the art only guaranteed an $O(n)$-IP stable clustering.
We show that an $O(1)$-IP stable clustering always exists for general metrics, and we give an efficient algorithm which outputs such a clustering.
- Score: 26.716316367717585
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Individual preference (IP) stability, introduced by Ahmadi et al. (ICML
2022), is a natural clustering objective inspired by stability and fairness
constraints. A clustering is $\alpha$-IP stable if the average distance of
every data point to its own cluster is at most $\alpha$ times the average
distance to any other cluster. Unfortunately, determining if a dataset admits a
$1$-IP stable clustering is NP-Hard. Moreover, before this work, it was unknown
if an $o(n)$-IP stable clustering always \emph{exists}, as the prior state of
the art only guaranteed an $O(n)$-IP stable clustering. We close this gap in
understanding and show that an $O(1)$-IP stable clustering always exists for
general metrics, and we give an efficient algorithm which outputs such a
clustering. We also introduce generalizations of IP stability beyond average
distance and give efficient, near-optimal algorithms in the cases where we
consider the maximum and minimum distances within and between clusters.
Related papers
- Scalable Algorithms for Individual Preference Stable Clustering [8.01184332330228]
In this paper, we study the natural local search algorithm for IP stable clustering.
Our analysis confirms a $O(log n)$-IP stability guarantee for this algorithm, where $n$ denotes the number of points in the input.
arXiv Detail & Related papers (2024-03-15T14:58:27Z) - Dependent Cluster Mapping (DCMAP): Optimal clustering of directed
acyclic graphs for statistical inference [0.0]
A Directed Acyclic Graph (DAG) can be partitioned or mapped into clusters to support inference.
We propose a novel algorithm called DCMAP for optimal cluster mapping with dependent clusters.
For a 25 and 50-node DBN, the search space size was $9.91times 109$ and $1.51times1021$ possible cluster mappings, and the first optimal solution was found at 934 $(text95% CI 926,971)$.
arXiv Detail & Related papers (2023-08-08T01:01:37Z) - Differentially Private Clustering in Data Streams [65.78882209673885]
We present a differentially private streaming clustering framework which only requires an offline DP coreset or clustering algorithm as a blackbox.
Our framework is also differentially private under the continual release setting, i.e., the union of outputs of our algorithms at every timestamp is always differentially private.
arXiv Detail & Related papers (2023-07-14T16:11:22Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Differentially-Private Hierarchical Clustering with Provable
Approximation Guarantees [79.59010418610625]
We study differentially private approximation algorithms for hierarchical clustering.
We show strong lower bounds for the problem: that any $epsilon$-DP algorithm must exhibit $O(|V|2/ epsilon)$-additive error for an input dataset.
We propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly.
arXiv Detail & Related papers (2023-01-31T19:14:30Z) - Individual Preference Stability for Clustering [24.301650646957246]
We propose a natural notion of individual preference (IP) stability for clustering.
Our notion asks that every data point, on average, is closer to the points in its own cluster than to the points in any other cluster.
arXiv Detail & Related papers (2022-07-07T22:01:01Z) - Differentially-Private Clustering of Easy Instances [67.04951703461657]
In differentially private clustering, the goal is to identify $k$ cluster centers without disclosing information on individual data points.
We provide implementable differentially private clustering algorithms that provide utility when the data is "easy"
We propose a framework that allows us to apply non-private clustering algorithms to the easy instances and privately combine the results.
arXiv Detail & Related papers (2021-12-29T08:13:56Z) - Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem.
We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z) - K-expectiles clustering [0.0]
We propose a novel partitioning clustering algorithm based on expectiles.
We suggest two schemes: fixed $tau$ clustering, and adaptive $tau$ clustering.
arXiv Detail & Related papers (2021-03-16T21:14:56Z) - Stable and consistent density-based clustering via multiparameter
persistence [77.34726150561087]
We consider the degree-Rips construction from topological data analysis.
We analyze its stability to perturbations of the input data using the correspondence-interleaving distance.
We integrate these methods into a pipeline for density-based clustering, which we call Persistable.
arXiv Detail & Related papers (2020-05-18T19:45:04Z) - Explainable $k$-Means and $k$-Medians Clustering [25.513261099927163]
We consider using a small decision tree to partition a data set into clusters, so that clusters can be characterized in a straightforward manner.
We show that popular top-down decision tree algorithms may lead to clusterings with arbitrarily large cost.
We design an efficient algorithm that produces explainable clusters using a tree with $k$ leaves.
arXiv Detail & Related papers (2020-02-28T04:21:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.