Related papers: Differentially Private Vertical Federated Clustering

Differentially Private Vertical Federated Clustering

URL: http://arxiv.org/abs/2208.01700v2
Date: Fri, 31 Mar 2023 02:55:04 GMT
Title: Differentially Private Vertical Federated Clustering
Authors: Zitao Li, Tianhao Wang, Ninghui Li
Abstract summary: In many applications, multiple parties have private data regarding the same set of users but on disjoint sets of attributes. To enable model learning while protecting the privacy of the data subjects, we need vertical federated learning (VFL) techniques. The algorithm proposed in this paper is the first practical solution for differentially private vertical federated k-means clustering.
Score: 13.27934054846057
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In many applications, multiple parties have private data regarding the same set of users but on disjoint sets of attributes, and a server wants to leverage the data to train a model. To enable model learning while protecting the privacy of the data subjects, we need vertical federated learning (VFL) techniques, where the data parties share only information for training the model, instead of the private data. However, it is challenging to ensure that the shared information maintains privacy while learning accurate models. To the best of our knowledge, the algorithm proposed in this paper is the first practical solution for differentially private vertical federated k-means clustering, where the server can obtain a set of global centers with a provable differential privacy guarantee. Our algorithm assumes an untrusted central server that aggregates differentially private local centers and membership encodings from local data parties. It builds a weighted grid as the synopsis of the global dataset based on the received information. Final centers are generated by running any k-means algorithm on the weighted grid. Our approach for grid weight estimation uses a novel, light-weight, and differentially private set intersection cardinality estimation algorithm based on the Flajolet-Martin sketch. To improve the estimation accuracy in the setting with more than two data parties, we further propose a refined version of the weights estimation algorithm and a parameter tuning strategy to reduce the final k-means utility to be close to that in the central private setting. We provide theoretical utility analysis and experimental evaluation results for the cluster centers computed by our algorithm and show that our approach performs better both theoretically and empirically than the two baselines based on existing techniques.

Related papers

A Stochastic Optimization Framework for Private and Fair Learning From Decentralized Data [14.748203847227542]
We develop a novel algorithm for private and fair federated learning (FL) Our algorithm satisfies inter-silo record-level differential privacy (ISRL-DP) Experiments demonstrate the state-of-the-art fairness-accuracy framework tradeoffs of our algorithm across different privacy levels.
arXiv Detail & Related papers (2024-11-12T15:51:35Z)
Personalized Federated Learning for Cross-view Geo-localization [49.40531019551957]
We propose a methodology combining Federated Learning (FL) with Cross-view Image Geo-localization (CVGL) techniques. Our method implements a coarse-to-fine approach, where clients share only the coarse feature extractors while keeping fine-grained features specific to local environments. Results demonstrate that our federated CVGL method achieves performance close to centralized training while maintaining data privacy.
arXiv Detail & Related papers (2024-11-07T13:25:52Z)
Differentially Private Clustered Federated Learning [4.768272342753616]
Federated learning (FL) often incorporates differential privacy (DP) to provide rigorous data privacy guarantees. Previous works attempted to address high structured data heterogeneity in vanilla FL settings through clustering clients (a.k.a clustered FL) We propose an algorithm for differentially private clustered FL, which is robust to the DP noise in the system and identifies the underlying clients' clusters correctly.
arXiv Detail & Related papers (2024-05-29T17:03:31Z)
A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data. We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z)
Dynamically Weighted Federated k-Means [0.0]
Federated clustering enables multiple data sources to collaboratively cluster their data, maintaining decentralization and preserving privacy. We introduce a novel federated clustering algorithm named Dynamically Weighted Federated k-means (DWF k-means) based on Lloyd's method for k-means clustering. We conduct experiments on multiple datasets and data distribution settings to evaluate the performance of our algorithm in terms of clustering score, accuracy, and v-measure.
arXiv Detail & Related papers (2023-10-23T12:28:21Z)
Graph Federated Learning Based on the Decentralized Framework [8.619889123184649]
Graph-federated learning is mainly based on the classical federated learning framework i.e., the Client-Server framework. We introduce the decentralized framework to graph-federated learning. The proposed method is compared with FedAvg, Fedprox, GCFL, and GCFL+ to verify the effectiveness of the proposed method.
arXiv Detail & Related papers (2023-07-19T07:40:51Z)
Personalized Graph Federated Learning with Differential Privacy [6.282767337715445]
This paper presents a personalized graph federated learning (PGFL) framework in which distributedly connected servers and their respective edge devices collaboratively learn device or cluster-specific models. We study a variant of the PGFL implementation that utilizes differential privacy, specifically zero-concentrated differential privacy, where a noise sequence perturbs model exchanges. Our analysis shows that the algorithm ensures local differential privacy for all clients in terms of zero-concentrated differential privacy.
arXiv Detail & Related papers (2023-06-10T09:52:01Z)
Benchmarking FedAvg and FedCurv for Image Classification Tasks [1.376408511310322]
This paper focuses on the problem of statistical heterogeneity of the data in the same federated network. Several Federated Learning algorithms, such as FedAvg, FedProx and Federated Curvature (FedCurv) have already been proposed. As a side product of this work, we release the non-IID version of the datasets we used so to facilitate further comparisons from the FL community.
arXiv Detail & Related papers (2023-03-31T10:13:01Z)
Differentially Private Federated Clustering over Non-IID Data [59.611244450530315]
clustering clusters (FedC) problem aims to accurately partition unlabeled data samples distributed over massive clients into finite clients under the orchestration of a server. We propose a novel FedC algorithm using differential privacy convergence technique, referred to as DP-Fed, in which partial participation and multiple clients are also considered. Various attributes of the proposed DP-Fed are obtained through theoretical analyses of privacy protection, especially for the case of non-identically and independently distributed (non-i.i.d.) data.
arXiv Detail & Related papers (2023-01-03T05:38:43Z)
Personalization Improves Privacy-Accuracy Tradeoffs in Federated Optimization [57.98426940386627]
We show that coordinating local learning with private centralized learning yields a generically useful and improved tradeoff between accuracy and privacy. We illustrate our theoretical results with experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2022-02-10T20:44:44Z)
Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z)
FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data. To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.