Can Evolutionary Clustering Have Theoretical Guarantees?
- URL: http://arxiv.org/abs/2212.01771v2
- Date: Sat, 22 Jul 2023 04:12:30 GMT
- Title: Can Evolutionary Clustering Have Theoretical Guarantees?
- Authors: Chao Qian
- Abstract summary: Evolutionary clustering has found many successful applications, but all the results are empirical, lacking theoretical support.
This paper fills this gap by proving that the approximation performance of the GSEMO can be theoretically guaranteed.
- Score: 21.343803954998915
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clustering is a fundamental problem in many areas, which aims to partition a
given data set into groups based on some distance measure, such that the data
points in the same group are similar while that in different groups are
dissimilar. Due to its importance and NP-hardness, a lot of methods have been
proposed, among which evolutionary algorithms are a class of popular ones.
Evolutionary clustering has found many successful applications, but all the
results are empirical, lacking theoretical support. This paper fills this gap
by proving that the approximation performance of the GSEMO (a simple
multi-objective evolutionary algorithm) for solving four formulations of
clustering, i.e., $k$-tMM, $k$-center, discrete $k$-median and $k$-means, can
be theoretically guaranteed. Furthermore, we consider clustering under
fairness, which tries to avoid algorithmic bias, and has recently been an
important research topic in machine learning. We prove that for discrete
$k$-median clustering under individual fairness, the approximation performance
of the GSEMO can be theoretically guaranteed with respect to both the objective
function and the fairness constraint.
Related papers
- Fair Clustering for Data Summarization: Improved Approximation Algorithms and Complexity Insights [16.120911591795295]
In some applications all data points can be chosen as centers, in the general setting, centers must be chosen from a subset of points, referred as facilities or suppliers.
In this work, we focus on fair data summarization modeled as the fair $k$-supplier problem, where data consists of several groups, and a minimum number of centers must be selected from each group.
arXiv Detail & Related papers (2024-10-16T18:00:19Z) - Asymptotics for The $k$-means [0.6091702876917281]
The $k$-means is one of the most important unsupervised learning techniques in statistics and computer science.
The proposed clustering consistency is more appropriate than the previous criterion consistency for the clustering methods.
It is found that the proposed $k$-means method has lower clustering error rates and is more robust to small clusters and outliers.
arXiv Detail & Related papers (2022-11-18T03:36:58Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - A One-shot Framework for Distributed Clustered Learning in Heterogeneous
Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments.
One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees.
For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z) - Socially Fair Center-based and Linear Subspace Clustering [8.355270405285909]
Center-based clustering and linear subspace clustering are popular techniques to partition real-world data into smaller clusters.
Different clustering cost per point for different sensitive groups can lead to fairness-related harms.
We propose a unified framework to solve socially fair center-based clustering and linear subspace clustering.
arXiv Detail & Related papers (2022-08-22T07:10:17Z) - Differentially-Private Clustering of Easy Instances [67.04951703461657]
In differentially private clustering, the goal is to identify $k$ cluster centers without disclosing information on individual data points.
We provide implementable differentially private clustering algorithms that provide utility when the data is "easy"
We propose a framework that allows us to apply non-private clustering algorithms to the easy instances and privately combine the results.
arXiv Detail & Related papers (2021-12-29T08:13:56Z) - Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem.
We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z) - Deep Fair Discriminative Clustering [24.237000220172906]
We study a general notion of group-level fairness for binary and multi-state protected status variables (PSVs)
We propose a refinement learning algorithm to combine the clustering goal with the fairness objective to learn fair clusters adaptively.
Our framework shows promising results for novel clustering tasks including flexible fairness constraints, multi-state PSVs and predictive clustering.
arXiv Detail & Related papers (2021-05-28T23:50:48Z) - Protecting Individual Interests across Clusters: Spectral Clustering
with Guarantees [20.350342151402963]
We propose an individual fairness criterion for clustering a graph $mathcalG$ that requires each cluster to contain an adequate number of members connected to the individual.
We devise a spectral clustering algorithm to find fair clusters under a given representation graph.
arXiv Detail & Related papers (2021-05-08T15:03:25Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Fair Hierarchical Clustering [92.03780518164108]
We define a notion of fairness that mitigates over-representation in traditional clustering.
We show that our algorithms can find a fair hierarchical clustering, with only a negligible loss in the objective.
arXiv Detail & Related papers (2020-06-18T01:05:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.