Linear Programming based Approximation to Individually Fair k-Clustering with Outliers
- URL: http://arxiv.org/abs/2412.10923v1
- Date: Sat, 14 Dec 2024 18:16:46 GMT
- Title: Linear Programming based Approximation to Individually Fair k-Clustering with Outliers
- Authors: Binita Maity, Shrutimoy Das, Anirban Dasgupta,
- Abstract summary: We develop an individually fair $k$-means clustering algorithm for datasets that contain outliers.
For each point which is not an outlier, there must be a center within the $fracnk$ nearest neighbours of the given point.
We provide theoretical guarantees that our method leads to a guaranteed approximation of the fair radius as well as the clustering cost.
- Score: 1.4460482276563802
- License:
- Abstract: Individual fairness guarantees are often desirable properties to have, but they become hard to formalize when the dataset contains outliers. Here, we investigate the problem of developing an individually fair $k$-means clustering algorithm for datasets that contain outliers. That is, given $n$ points and $k$ centers, we want that for each point which is not an outlier, there must be a center within the $\frac{n}{k}$ nearest neighbours of the given point. While a few of the recent works have looked into individually fair clustering, this is the first work that explores this problem in the presence of outliers for $k$-means clustering. For this purpose, we define and solve a linear program (LP) that helps us identify the outliers. We exclude these outliers from the dataset and apply a rounding algorithm that computes the $k$ centers, such that the fairness constraint of the remaining points is satisfied. We also provide theoretical guarantees that our method leads to a guaranteed approximation of the fair radius as well as the clustering cost. We also demonstrate our techniques empirically on real-world datasets.
Related papers
- Relax and Merge: A Simple Yet Effective Framework for Solving Fair $k$-Means and $k$-sparse Wasserstein Barycenter Problems [8.74967598360817]
Given a dataset comprising several groups, the fairness constraint requires that each cluster should contain a proportion of points from each group.
We propose a novel Relax and Merge'' framework, where $rho$ is the approximate ratio of an off-the-shelf vanilla $k$-means algorithm.
If equipped with a PTAS of $k$-means, our solution can achieve an approximation ratio of $(5+O(epsilon))$ with only a slight violation of the fairness constraints.
arXiv Detail & Related papers (2024-11-02T02:50:12Z) - Fair Clustering for Data Summarization: Improved Approximation Algorithms and Complexity Insights [16.120911591795295]
In some applications all data points can be chosen as centers, in the general setting, centers must be chosen from a subset of points, referred as facilities or suppliers.
In this work, we focus on fair data summarization modeled as the fair $k$-supplier problem, where data consists of several groups, and a minimum number of centers must be selected from each group.
arXiv Detail & Related papers (2024-10-16T18:00:19Z) - A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al.
A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$.
We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z) - A Unified Framework for Center-based Clustering of Distributed Data [46.86543102499174]
We develop a family of distributed center-based clustering algorithms that work over networks of users.
Our framework allows for a broad class of smooth convex loss functions, including popular clustering losses like $K$-means and Huber loss.
For the special case of Bregman losses, we show that our fixed points converge to the set of Lloyd points.
arXiv Detail & Related papers (2024-02-02T10:44:42Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - No More Than 6ft Apart: Robust K-Means via Radius Upper Bounds [17.226362076527764]
Centroid based clustering methods such as k-means, k-medoids and k-centers are heavily applied as a go-to tool in exploratory data analysis.
In many cases, those methods are used to obtain representative centroids of the data manifold for visualization or summarization of a dataset.
We propose to remedy such a scenario by introducing a maximal radius constraint $r$ on the clusters formed by centroids.
arXiv Detail & Related papers (2022-03-04T18:59:02Z) - Distributed k-Means with Outliers in General Metrics [0.6117371161379208]
We present a distributed coreset-based 3-round approximation algorithm for k-means with $z$ outliers for general metric spaces.
An important feature of our algorithm is that it obliviously adapts to the intrinsic complexity of the dataset, captured by the dimension doubling $D$ of the metric space.
arXiv Detail & Related papers (2022-02-16T16:24:31Z) - Differentially-Private Clustering of Easy Instances [67.04951703461657]
In differentially private clustering, the goal is to identify $k$ cluster centers without disclosing information on individual data points.
We provide implementable differentially private clustering algorithms that provide utility when the data is "easy"
We propose a framework that allows us to apply non-private clustering algorithms to the easy instances and privately combine the results.
arXiv Detail & Related papers (2021-12-29T08:13:56Z) - Robust Trimmed k-means [70.88503833248159]
We propose Robust Trimmed k-means (RTKM) that simultaneously identifies outliers and clusters points.
We show RTKM performs competitively with other methods on single membership data with outliers and multi-membership data without outliers.
arXiv Detail & Related papers (2021-08-16T15:49:40Z) - Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem.
We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z) - Structures of Spurious Local Minima in $k$-means [20.155509538529568]
We investigate the structures of spurious local solutions to the $k$-means problem.
We prove that this is essentially the only type of spurious local minima under a separation condition.
arXiv Detail & Related papers (2020-02-16T22:15:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.