Related papers: Lifelong DP: Consistently Bounded Differential Privacy in Lifelong Machine Learning

Lifelong DP: Consistently Bounded Differential Privacy in Lifelong Machine Learning

URL: http://arxiv.org/abs/2207.12831v1
Date: Tue, 26 Jul 2022 11:55:21 GMT
Title: Lifelong DP: Consistently Bounded Differential Privacy in Lifelong Machine Learning
Authors: Phung Lai, Han Hu, NhatHai Phan, Ruoming Jin, My T. Thai, An M. Chen
Abstract summary: We show that the process of continually learning new tasks and memorizing previous tasks introduces unknown privacy risks and challenges to bound the privacy loss. We introduce a formal definition of Lifelong DP, in which the participation of any datas in the training set of any tasks is protected. We propose a scalable and heterogeneous algorithm, called L2DP-ML, to efficiently train and continue releasing new versions of an L2M model.
Score: 28.68587691924582
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we show that the process of continually learning new tasks and memorizing previous tasks introduces unknown privacy risks and challenges to bound the privacy loss. Based upon this, we introduce a formal definition of Lifelong DP, in which the participation of any data tuples in the training set of any tasks is protected, under a consistently bounded DP protection, given a growing stream of tasks. A consistently bounded DP means having only one fixed value of the DP privacy budget, regardless of the number of tasks. To preserve Lifelong DP, we propose a scalable and heterogeneous algorithm, called L2DP-ML with a streaming batch training, to efficiently train and continue releasing new versions of an L2M model, given the heterogeneity in terms of data sizes and the training order of tasks, without affecting DP protection of the private training set. An end-to-end theoretical analysis and thorough evaluations show that our mechanism is significantly better than baseline approaches in preserving Lifelong DP. The implementation of L2DP-ML is available at: https://github.com/haiphanNJIT/PrivateDeepLearning.

Related papers

DP-2Stage: Adapting Language Models as Differentially Private Tabular Data Generators [47.86275136491794]
We propose DP-2Stage, a two-stage fine-tuning framework for differentially private data generation. Our empirical results show that this approach improves performance across various settings and metrics.
arXiv Detail & Related papers (2024-12-03T14:10:09Z)
To Shuffle or not to Shuffle: Auditing DP-SGD with Shuffling [25.669347036509134]
The Differentially Private Gradient Descent (DP-SGD) algorithm allows the training of machine learning (ML) models with formal Differential Privacy (DP) guarantees. It has become common practice to replace sub-sampling with shuffling owing to better compatibility and computational overhead. We introduce a novel DP auditing procedure to analyze DP-SGD with shuffling and show that DP models trained with this approach have considerably overestimated privacy guarantees.
arXiv Detail & Related papers (2024-11-15T22:34:28Z)
Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit. We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z)
Differentially Private Reinforcement Learning with Self-Play [18.124829682487558]
We study the problem of multi-agent reinforcement learning (multi-agent RL) with differential privacy (DP) constraints. We first extend the definitions of Joint DP (JDP) and Local DP (LDP) to two-player zero-sum episodic Markov Games. We design a provably efficient algorithm based on optimistic Nash value and privatization of Bernstein-type bonuses.
arXiv Detail & Related papers (2024-04-11T08:42:51Z)
Pre-training Differentially Private Models with Limited Public Data [54.943023722114134]
differential privacy (DP) is a prominent method to gauge the degree of security provided to the models. DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training stage. We develop a novel DP continual pre-training strategy using only 10% of public data. Our strategy can achieve DP accuracy of 41.5% on ImageNet-21k, as well as non-DP accuracy of 55.7% and and 60.0% on downstream tasks Places365 and iNaturalist-2021.
arXiv Detail & Related papers (2024-02-28T23:26:27Z)
Closed-Form Bounds for DP-SGD against Record-level Inference [18.85865832127335]
We focus on the popular DP-SGD algorithm, and derive simple closed-form bounds. We obtain bounds for membership inference that match state-of-the-art techniques. We present a novel data-dependent bound against attribute inference.
arXiv Detail & Related papers (2024-02-22T09:26:16Z)
Differentially Private Zeroth-Order Methods for Scalable Large Language Model Finetuning [0.0]
differentially private (DP) fine-tuning of pretrained LLMs has been widely used to safeguarding the privacy of task-specific datasets. Despite pushing the scalability of DP-SGD to its limit, DP-SGD-based fine-tuning methods are unfortunately limited by the inherent inefficiency of SGD.
arXiv Detail & Related papers (2024-02-12T17:24:15Z)
Private Fine-tuning of Large Language Models with Zeroth-order Optimization [51.19403058739522]
Differentially private gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner. We introduce DP-ZO, a private fine-tuning framework for large language models by privatizing zeroth order optimization methods.
arXiv Detail & Related papers (2024-01-09T03:53:59Z)
DPZero: Private Fine-Tuning of Language Models without Backpropagation [49.365749361283704]
We introduce DPZero, a novel private zeroth-order algorithm with nearly dimension-independent rates. The memory efficiency of DPZero is demonstrated in privately fine-tuning RoBERTa and OPT on several downstream tasks.
arXiv Detail & Related papers (2023-10-14T18:42:56Z)
Selectivity Drives Productivity: Efficient Dataset Pruning for Enhanced Transfer Learning [66.20311762506702]
dataset pruning (DP) has emerged as an effective way to improve data efficiency. We propose two new DP methods, label mapping and feature mapping, for supervised and self-supervised pretraining settings. We show that source data classes can be pruned by up to 40% 80% without sacrificing downstream performance.
arXiv Detail & Related papers (2023-10-13T00:07:49Z)
Personalized DP-SGD using Sampling Mechanisms [5.50042037663784]
We extend Differentially Private Gradient Descent (DP-SGD) to support a recent privacy notion called ($Phi$,$Delta$)- Personalized Differential Privacy (($Phi$,$Delta$)- PDP. Our algorithm uses a multi-round personalized sampling mechanism and embeds it within the DP-SGD iteration. Experiments on real datasets show that our algorithm outperforms DP-SGD and simple combinations of DP-SGD with existing PDP mechanisms.
arXiv Detail & Related papers (2023-05-24T13:56:57Z)
How to DP-fy ML: A Practical Guide to Machine Learning with Differential Privacy [22.906644117887133]
Differential Privacy (DP) has become a gold standard for making formal statements about data anonymization. The adoption of DP is hindered by limited practical guidance of what DP protection entails, what privacy guarantees to aim for, and the difficulty of achieving good privacy-utility-computation trade-offs for ML models. This work is a self-contained guide that gives an in-depth overview of the field of DP ML and presents information about achieving the best possible DP ML model with rigorous privacy guarantees.
arXiv Detail & Related papers (2023-03-01T16:56:39Z)
A New Dimensionality Reduction Method Based on Hensel's Compression for Privacy Protection in Federated Learning [1.0152838128195467]
We propose two layers of privacy protection approach to overcome the limitations of existing DP-based approaches. The first layer reduces the dimension of the training dataset based on Hensel's Lemma. The second layer applies DP to the compressed dataset generated by the first layer.
arXiv Detail & Related papers (2022-05-01T23:52:16Z)
Exploration-Exploitation in Constrained MDPs [79.23623305214275]
We investigate the exploration-exploitation dilemma in Constrained Markov Decision Processes (CMDPs) While learning in an unknown CMDP, an agent should trade-off exploration to discover new information about the MDP. While the agent will eventually learn a good or optimal policy, we do not want the agent to violate the constraints too often during the learning process.
arXiv Detail & Related papers (2020-03-04T17:03:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.