How to DP-fy ML: A Practical Guide to Machine Learning with Differential
Privacy
- URL: http://arxiv.org/abs/2303.00654v3
- Date: Mon, 31 Jul 2023 19:10:09 GMT
- Title: How to DP-fy ML: A Practical Guide to Machine Learning with Differential
Privacy
- Authors: Natalia Ponomareva, Hussein Hazimeh, Alex Kurakin, Zheng Xu, Carson
Denison, H. Brendan McMahan, Sergei Vassilvitskii, Steve Chien and Abhradeep
Thakurta
- Abstract summary: Differential Privacy (DP) has become a gold standard for making formal statements about data anonymization.
The adoption of DP is hindered by limited practical guidance of what DP protection entails, what privacy guarantees to aim for, and the difficulty of achieving good privacy-utility-computation trade-offs for ML models.
This work is a self-contained guide that gives an in-depth overview of the field of DP ML and presents information about achieving the best possible DP ML model with rigorous privacy guarantees.
- Score: 22.906644117887133
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: ML models are ubiquitous in real world applications and are a constant focus
of research. At the same time, the community has started to realize the
importance of protecting the privacy of ML training data.
Differential Privacy (DP) has become a gold standard for making formal
statements about data anonymization. However, while some adoption of DP has
happened in industry, attempts to apply DP to real world complex ML models are
still few and far between. The adoption of DP is hindered by limited practical
guidance of what DP protection entails, what privacy guarantees to aim for, and
the difficulty of achieving good privacy-utility-computation trade-offs for ML
models. Tricks for tuning and maximizing performance are scattered among papers
or stored in the heads of practitioners. Furthermore, the literature seems to
present conflicting evidence on how and whether to apply architectural
adjustments and which components are "safe" to use with DP.
This work is a self-contained guide that gives an in-depth overview of the
field of DP ML and presents information about achieving the best possible DP ML
model with rigorous privacy guarantees. Our target audience is both researchers
and practitioners. Researchers interested in DP for ML will benefit from a
clear overview of current advances and areas for improvement. We include
theory-focused sections that highlight important topics such as privacy
accounting and its assumptions, and convergence. For a practitioner, we provide
a background in DP theory and a clear step-by-step guide for choosing an
appropriate privacy definition and approach, implementing DP training,
potentially updating the model architecture, and tuning hyperparameters. For
both researchers and practitioners, consistently and fully reporting privacy
guarantees is critical, and so we propose a set of specific best practices for
stating guarantees.
Related papers
- Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit.
We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z) - Belt and Braces: When Federated Learning Meets Differential Privacy [22.116742377692518]
Federated learning (FL) has great potential for large-scale machine learning (ML) without exposing raw data.
Differential privacy (DP) is the de facto standard of privacy protection with provable guarantees.
Practitioners often not only are not fully aware of its development and categorization, but also face a hard choice between privacy and utility.
arXiv Detail & Related papers (2024-04-29T15:51:49Z) - Pre-training Differentially Private Models with Limited Public Data [54.943023722114134]
differential privacy (DP) is a prominent method to gauge the degree of security provided to the models.
DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training stage.
We develop a novel DP continual pre-training strategy using only 10% of public data.
Our strategy can achieve DP accuracy of 41.5% on ImageNet-21k, as well as non-DP accuracy of 55.7% and and 60.0% on downstream tasks Places365 and iNaturalist-2021.
arXiv Detail & Related papers (2024-02-28T23:26:27Z) - LLM-based Privacy Data Augmentation Guided by Knowledge Distillation
with a Distribution Tutor for Medical Text Classification [67.92145284679623]
We propose a DP-based tutor that models the noised private distribution and controls samples' generation with a low privacy cost.
We theoretically analyze our model's privacy protection and empirically verify our model.
arXiv Detail & Related papers (2024-02-26T11:52:55Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - No Free Lunch in "Privacy for Free: How does Dataset Condensation Help
Privacy" [75.98836424725437]
New methods designed to preserve data privacy require careful scrutiny.
Failure to preserve privacy is hard to detect, and yet can lead to catastrophic results when a system implementing a privacy-preserving'' method is attacked.
arXiv Detail & Related papers (2022-09-29T17:50:23Z) - Lifelong DP: Consistently Bounded Differential Privacy in Lifelong
Machine Learning [28.68587691924582]
We show that the process of continually learning new tasks and memorizing previous tasks introduces unknown privacy risks and challenges to bound the privacy loss.
We introduce a formal definition of Lifelong DP, in which the participation of any datas in the training set of any tasks is protected.
We propose a scalable and heterogeneous algorithm, called L2DP-ML, to efficiently train and continue releasing new versions of an L2M model.
arXiv Detail & Related papers (2022-07-26T11:55:21Z) - A Critical Review on the Use (and Misuse) of Differential Privacy in
Machine Learning [5.769445676575767]
We review the use of differential privacy (DP) for privacy protection in machine learning (ML)
We show that, driven by the aim of preserving the accuracy of the learned models, DP-based ML implementations are so loose that they do not offer the ex ante privacy guarantees of DP.
arXiv Detail & Related papers (2022-06-09T17:13:10Z) - Differential Privacy: What is all the noise about? [0.0]
Differential Privacy (DP) is a formal definition of privacy that provides rigorous guarantees against risks of privacy breaches during data processing.
This paper aims to provide an overview of the most important ideas, concepts and uses of DP in Machine Learning (ML)
arXiv Detail & Related papers (2022-05-19T10:12:29Z) - Distributed Machine Learning and the Semblance of Trust [66.1227776348216]
Federated Learning (FL) allows the data owner to maintain data governance and perform model training locally without having to share their data.
FL and related techniques are often described as privacy-preserving.
We explain why this term is not appropriate and outline the risks associated with over-reliance on protocols that were not designed with formal definitions of privacy in mind.
arXiv Detail & Related papers (2021-12-21T08:44:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.