Privately Customizing Prefinetuning to Better Match User Data in
Federated Learning
- URL: http://arxiv.org/abs/2302.09042v1
- Date: Fri, 17 Feb 2023 18:18:22 GMT
- Title: Privately Customizing Prefinetuning to Better Match User Data in
Federated Learning
- Authors: Charlie Hou, Hongyuan Zhan, Akshat Shrivastava, Sid Wang, Sasha
Livshits, Giulia Fanti, Daniel Lazar
- Abstract summary: In Federated Learning (FL), accessing private client data incurs communication and privacy costs.
We propose FreD (Federated Private Fr'echet Distance) -- a privately computed distance between a prefinetuning dataset and federated datasets.
We show empirically that FreD accurately predicts the best prefinetuning dataset at minimal privacy cost.
- Score: 3.645000701985685
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In Federated Learning (FL), accessing private client data incurs
communication and privacy costs. As a result, FL deployments commonly
prefinetune pretrained foundation models on a (large, possibly public) dataset
that is held by the central server; they then FL-finetune the model on a
private, federated dataset held by clients. Evaluating prefinetuning dataset
quality reliably and privately is therefore of high importance. To this end, we
propose FreD (Federated Private Fr\'echet Distance) -- a privately computed
distance between a prefinetuning dataset and federated datasets. Intuitively,
it privately computes and compares a Fr\'echet distance between embeddings
generated by a large language model on both the central (public) dataset and
the federated private client data. To make this computation privacy-preserving,
we use distributed, differentially-private mean and covariance estimators. We
show empirically that FreD accurately predicts the best prefinetuning dataset
at minimal privacy cost. Altogether, using FreD we demonstrate a
proof-of-concept for a new approach in private FL training: (1) customize a
prefinetuning dataset to better match user data (2) prefinetune (3) perform
FL-finetuning.
Related papers
- Federated Learning Empowered by Generative Content [55.576885852501775]
Federated learning (FL) enables leveraging distributed private data for model training in a privacy-preserving way.
We propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content.
We conduct a systematic empirical study on FedGC, covering diverse baselines, datasets, scenarios, and modalities.
arXiv Detail & Related papers (2023-12-10T07:38:56Z) - Can Public Large Language Models Help Private Cross-device Federated Learning? [58.05449579773249]
We study (differentially) private federated learning (FL) of language models.
Public data has been used to improve privacy-utility trade-offs for both large and small language models.
We propose a novel distribution matching algorithm with theoretical grounding to sample public data close to private data distribution.
arXiv Detail & Related papers (2023-05-20T07:55:58Z) - Differentially Private Vertical Federated Clustering [13.27934054846057]
In many applications, multiple parties have private data regarding the same set of users but on disjoint sets of attributes.
To enable model learning while protecting the privacy of the data subjects, we need vertical federated learning (VFL) techniques.
The algorithm proposed in this paper is the first practical solution for differentially private vertical federated k-means clustering.
arXiv Detail & Related papers (2022-08-02T19:23:48Z) - Federated Learning in Non-IID Settings Aided by Differentially Private
Synthetic Data [20.757477553095637]
Federated learning (FL) is a privacy-promoting framework that enables clients to collaboratively train machine learning models.
A major challenge in federated learning arises when the local data is heterogeneous.
We propose FedDPMS, an FL algorithm in which clients deploy variational auto-encoders to augment local datasets with data synthesized using differentially private means of latent data representations.
arXiv Detail & Related papers (2022-06-01T18:00:48Z) - Mixed Differential Privacy in Computer Vision [133.68363478737058]
AdaMix is an adaptive differentially private algorithm for training deep neural network classifiers using both private and public image data.
A few-shot or even zero-shot learning baseline that ignores private data can outperform fine-tuning on a large private dataset.
arXiv Detail & Related papers (2022-03-22T06:15:43Z) - Personalization Improves Privacy-Accuracy Tradeoffs in Federated
Optimization [57.98426940386627]
We show that coordinating local learning with private centralized learning yields a generically useful and improved tradeoff between accuracy and privacy.
We illustrate our theoretical results with experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2022-02-10T20:44:44Z) - Understanding Clipping for Federated Learning: Convergence and
Client-Level Differential Privacy [67.4471689755097]
This paper empirically demonstrates that the clipped FedAvg can perform surprisingly well even with substantial data heterogeneity.
We provide the convergence analysis of a differential private (DP) FedAvg algorithm and highlight the relationship between clipping bias and the distribution of the clients' updates.
arXiv Detail & Related papers (2021-06-25T14:47:19Z) - PFA: Privacy-preserving Federated Adaptation for Effective Model
Personalization [6.66389628571674]
Federated learning (FL) has become a prevalent distributed machine learning paradigm with improved privacy.
This paper introduces a new concept called federated adaptation, targeting at adapting the trained model in a federated manner to achieve better personalization results.
We propose PFA, a framework to accomplish Privacy-preserving Federated Adaptation.
arXiv Detail & Related papers (2021-03-02T08:07:34Z) - Personalized Federated Learning with First Order Model Optimization [76.81546598985159]
We propose an alternative to federated learning, where each client federates with other relevant clients to obtain a stronger model per client-specific objectives.
We do not assume knowledge of underlying data distributions or client similarities, and allow each client to optimize for arbitrary target distributions of interest.
Our method outperforms existing alternatives, while also enabling new features for personalized FL such as transfer outside of local data distributions.
arXiv Detail & Related papers (2020-12-15T19:30:29Z) - Generating private data with user customization [9.415164800448853]
Mobile devices can produce and store large amounts of data that can enhance machine learning models.
However, this data may contain private information specific to the data owner that prevents the release of the data.
We want to reduce the correlation between user-specific private information and the data while retaining the useful information.
arXiv Detail & Related papers (2020-12-02T19:13:58Z) - Prioritized Multi-Criteria Federated Learning [16.35440946424973]
In Machine Learning scenarios, privacy is a crucial concern when models have to be trained with private data coming from users of a service.
We propose Federated Learning (FL) as a means to build ML models based on private datasets distributed over a large number of clients.
A central coordinating server receives locally computed updates by clients and aggregate them to obtain a better global model.
arXiv Detail & Related papers (2020-07-17T10:49:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.