Differentially Private Vertical Federated Clustering
- URL: http://arxiv.org/abs/2208.01700v2
- Date: Fri, 31 Mar 2023 02:55:04 GMT
- Title: Differentially Private Vertical Federated Clustering
- Authors: Zitao Li, Tianhao Wang, Ninghui Li
- Abstract summary: In many applications, multiple parties have private data regarding the same set of users but on disjoint sets of attributes.
To enable model learning while protecting the privacy of the data subjects, we need vertical federated learning (VFL) techniques.
The algorithm proposed in this paper is the first practical solution for differentially private vertical federated k-means clustering.
- Score: 13.27934054846057
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many applications, multiple parties have private data regarding the same
set of users but on disjoint sets of attributes, and a server wants to leverage
the data to train a model. To enable model learning while protecting the
privacy of the data subjects, we need vertical federated learning (VFL)
techniques, where the data parties share only information for training the
model, instead of the private data. However, it is challenging to ensure that
the shared information maintains privacy while learning accurate models. To the
best of our knowledge, the algorithm proposed in this paper is the first
practical solution for differentially private vertical federated k-means
clustering, where the server can obtain a set of global centers with a provable
differential privacy guarantee. Our algorithm assumes an untrusted central
server that aggregates differentially private local centers and membership
encodings from local data parties. It builds a weighted grid as the synopsis of
the global dataset based on the received information. Final centers are
generated by running any k-means algorithm on the weighted grid. Our approach
for grid weight estimation uses a novel, light-weight, and differentially
private set intersection cardinality estimation algorithm based on the
Flajolet-Martin sketch. To improve the estimation accuracy in the setting with
more than two data parties, we further propose a refined version of the weights
estimation algorithm and a parameter tuning strategy to reduce the final
k-means utility to be close to that in the central private setting. We provide
theoretical utility analysis and experimental evaluation results for the
cluster centers computed by our algorithm and show that our approach performs
better both theoretically and empirically than the two baselines based on
existing techniques.
Related papers
- Personalized Federated Learning for Cross-view Geo-localization [49.40531019551957]
We propose a methodology combining Federated Learning (FL) with Cross-view Image Geo-localization (CVGL) techniques.
Our method implements a coarse-to-fine approach, where clients share only the coarse feature extractors while keeping fine-grained features specific to local environments.
Results demonstrate that our federated CVGL method achieves performance close to centralized training while maintaining data privacy.
arXiv Detail & Related papers (2024-11-07T13:25:52Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - A Novel Neural Network-Based Federated Learning System for Imbalanced
and Non-IID Data [2.9642661320713555]
Most machine learning algorithms rely heavily on large amount of data which may be collected from various sources.
To combat this issue, researchers have introduced federated learning, where a prediction model is learnt by ensuring the privacy of data of clients data.
In this research, we propose a centralized, neural network-based federated learning system.
arXiv Detail & Related papers (2023-11-16T17:14:07Z) - Dynamically Weighted Federated k-Means [0.0]
Federated clustering enables multiple data sources to collaboratively cluster their data, maintaining decentralization and preserving privacy.
We introduce a novel federated clustering algorithm named Dynamically Weighted Federated k-means (DWF k-means) based on Lloyd's method for k-means clustering.
We conduct experiments on multiple datasets and data distribution settings to evaluate the performance of our algorithm in terms of clustering score, accuracy, and v-measure.
arXiv Detail & Related papers (2023-10-23T12:28:21Z) - Graph Federated Learning Based on the Decentralized Framework [8.619889123184649]
Graph-federated learning is mainly based on the classical federated learning framework i.e., the Client-Server framework.
We introduce the decentralized framework to graph-federated learning.
The proposed method is compared with FedAvg, Fedprox, GCFL, and GCFL+ to verify the effectiveness of the proposed method.
arXiv Detail & Related papers (2023-07-19T07:40:51Z) - Personalized Graph Federated Learning with Differential Privacy [6.282767337715445]
This paper presents a personalized graph federated learning (PGFL) framework in which distributedly connected servers and their respective edge devices collaboratively learn device or cluster-specific models.
We study a variant of the PGFL implementation that utilizes differential privacy, specifically zero-concentrated differential privacy, where a noise sequence perturbs model exchanges.
Our analysis shows that the algorithm ensures local differential privacy for all clients in terms of zero-concentrated differential privacy.
arXiv Detail & Related papers (2023-06-10T09:52:01Z) - Benchmarking FedAvg and FedCurv for Image Classification Tasks [1.376408511310322]
This paper focuses on the problem of statistical heterogeneity of the data in the same federated network.
Several Federated Learning algorithms, such as FedAvg, FedProx and Federated Curvature (FedCurv) have already been proposed.
As a side product of this work, we release the non-IID version of the datasets we used so to facilitate further comparisons from the FL community.
arXiv Detail & Related papers (2023-03-31T10:13:01Z) - Differentially Private Federated Clustering over Non-IID Data [59.611244450530315]
clustering clusters (FedC) problem aims to accurately partition unlabeled data samples distributed over massive clients into finite clients under the orchestration of a server.
We propose a novel FedC algorithm using differential privacy convergence technique, referred to as DP-Fed, in which partial participation and multiple clients are also considered.
Various attributes of the proposed DP-Fed are obtained through theoretical analyses of privacy protection, especially for the case of non-identically and independently distributed (non-i.i.d.) data.
arXiv Detail & Related papers (2023-01-03T05:38:43Z) - Personalization Improves Privacy-Accuracy Tradeoffs in Federated
Optimization [57.98426940386627]
We show that coordinating local learning with private centralized learning yields a generically useful and improved tradeoff between accuracy and privacy.
We illustrate our theoretical results with experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2022-02-10T20:44:44Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.