Heterogeneity for the Win: One-Shot Federated Clustering
- URL: http://arxiv.org/abs/2103.00697v1
- Date: Mon, 1 Mar 2021 02:17:33 GMT
- Title: Heterogeneity for the Win: One-Shot Federated Clustering
- Authors: Don Kurian Dennis, Tian Li and Virginia Smith
- Abstract summary: We develop and analyze a one-shot federated clustering scheme, $k$-FED, based on the widely-used Lloyd's method for $k$-means clustering.
We show that the issue of statistical heterogeneity in federated networks can in fact benefit our analysis.
- Score: 8.64969514480008
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this work, we explore the unique challenges -- and opportunities -- of
unsupervised federated learning (FL). We develop and analyze a one-shot
federated clustering scheme, $k$-FED, based on the widely-used Lloyd's method
for $k$-means clustering. In contrast to many supervised problems, we show that
the issue of statistical heterogeneity in federated networks can in fact
benefit our analysis. We analyse $k$-FED under a center separation assumption
and compare it to the best known requirements of its centralized counterpart.
Our analysis shows that in heterogeneous regimes where the number of clusters
per device $(k')$ is smaller than the total number of clusters over the network
$k$, $(k'\le \sqrt{k})$, we can use heterogeneity to our advantage --
significantly weakening the cluster separation requirements for $k$-FED. From a
practical viewpoint, $k$-FED also has many desirable properties: it requires
only round of communication, can run asynchronously, and can handle partial
participation or node/network failures. We motivate our analysis with
experiments on common FL benchmarks, and highlight the practical utility of
one-shot clustering through use-cases in personalized FL and device sampling.
Related papers
- Dynamically Weighted Federated k-Means [0.0]
Federated clustering enables multiple data sources to collaboratively cluster their data, maintaining decentralization and preserving privacy.
We introduce a novel federated clustering algorithm named Dynamically Weighted Federated k-means (DWF k-means) based on Lloyd's method for k-means clustering.
We conduct experiments on multiple datasets and data distribution settings to evaluate the performance of our algorithm in terms of clustering score, accuracy, and v-measure.
arXiv Detail & Related papers (2023-10-23T12:28:21Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - A One-shot Framework for Distributed Clustered Learning in Heterogeneous
Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments.
One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees.
For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z) - Efficient Distribution Similarity Identification in Clustered Federated
Learning via Principal Angles Between Client Data Subspaces [59.33965805898736]
Clustered learning has been shown to produce promising results by grouping clients into clusters.
Existing FL algorithms are essentially trying to group clients together with similar distributions.
Prior FL algorithms attempt similarities indirectly during training.
arXiv Detail & Related papers (2022-09-21T17:37:54Z) - Differentially-Private Clustering of Easy Instances [67.04951703461657]
In differentially private clustering, the goal is to identify $k$ cluster centers without disclosing information on individual data points.
We provide implementable differentially private clustering algorithms that provide utility when the data is "easy"
We propose a framework that allows us to apply non-private clustering algorithms to the easy instances and privately combine the results.
arXiv Detail & Related papers (2021-12-29T08:13:56Z) - Learning Statistical Representation with Joint Deep Embedded Clustering [2.1267423178232407]
StatDEC is an unsupervised framework for joint statistical representation learning and clustering.
Our experiments show that using these representations, one can considerably improve results on imbalanced image clustering across a variety of image datasets.
arXiv Detail & Related papers (2021-09-11T09:26:52Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - K-expectiles clustering [0.0]
We propose a novel partitioning clustering algorithm based on expectiles.
We suggest two schemes: fixed $tau$ clustering, and adaptive $tau$ clustering.
arXiv Detail & Related papers (2021-03-16T21:14:56Z) - An Efficient Framework for Clustered Federated Learning [26.24231986590374]
We address the problem of federated learning (FL) where users are distributed into clusters.
We propose the Iterative Federated Clustering Algorithm (IFCA)
We show that our algorithm is efficient in non- partitioned problems such as neural networks.
arXiv Detail & Related papers (2020-06-07T08:48:59Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.