Related papers: DP-Forward: Fine-tuning and Inference on Language Models with Differential Privacy in Forward Pass

DP-Forward: Fine-tuning and Inference on Language Models with Differential Privacy in Forward Pass

URL: http://arxiv.org/abs/2309.06746v2
Date: Tue, 19 Sep 2023 08:19:17 GMT
Title: DP-Forward: Fine-tuning and Inference on Language Models with Differential Privacy in Forward Pass
Authors: Minxin Du, Xiang Yue, Sherman S. M. Chow, Tianhao Wang, Chenyu Huang, Huan Sun,
Abstract summary: DP-Forward perturbs embedding in the forward pass of language models. It almost hits the non-private baseline and outperforms DP-SGD by up to 7.7pp at a moderate privacy level.
Score: 22.578388829171157
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Differentially private stochastic gradient descent (DP-SGD) adds noise to gradients in back-propagation, safeguarding training data from privacy leakage, particularly membership inference. It fails to cover (inference-time) threats like embedding inversion and sensitive attribute inference. It is also costly in storage and computation when used to fine-tune large pre-trained language models (LMs). We propose DP-Forward, which directly perturbs embedding matrices in the forward pass of LMs. It satisfies stringent local DP requirements for training and inference data. To instantiate it using the smallest matrix-valued noise, we devise an analytic matrix Gaussian~mechanism (aMGM) by drawing possibly non-i.i.d. noise from a matrix Gaussian distribution. We then investigate perturbing outputs from different hidden (sub-)layers of LMs with aMGM noises. Its utility on three typical tasks almost hits the non-private baseline and outperforms DP-SGD by up to 7.7pp at a moderate privacy level. It saves 3$\times$ time and memory costs compared to DP-SGD with the latest high-speed library. It also reduces the average success rates of embedding inversion and sensitive attribute inference by up to 88pp and 41pp, respectively, whereas DP-SGD fails.

Related papers

Memory-Efficient Differentially Private Training with Gradient Random Projection [23.309769734156383]
Differential privacy (DP) protects sensitive data during neural network training.<n>Standard methods like DP-Adam suffer from high memory overhead due to per-sample gradient clipping.<n>We introduce DP-GRAPE (Gradient RAndom ProjEction), a DP training method that significantly reduces memory usage.
arXiv Detail & Related papers (2025-06-18T16:05:09Z)
FedSVD: Adaptive Orthogonalization for Private Federated Learning with LoRA [61.79405341803085]
Low-Rank Adaptation (LoRA) is widely used for efficient fine-tuning of language models in federated learning (FL)<n>Low-Rank Adaptation (LoRA) is widely used for efficient fine-tuning of language models in federated learning (FL)
arXiv Detail & Related papers (2025-05-19T07:32:56Z)
To Shuffle or not to Shuffle: Auditing DP-SGD with Shuffling [25.669347036509134]
The Differentially Private Gradient Descent (DP-SGD) algorithm allows the training of machine learning (ML) models with formal Differential Privacy (DP) guarantees. It has become common practice to replace sub-sampling with shuffling owing to better compatibility and computational overhead. We introduce a novel DP auditing procedure to analyze DP-SGD with shuffling and show that DP models trained with this approach have considerably overestimated privacy guarantees.
arXiv Detail & Related papers (2024-11-15T22:34:28Z)
Noise Variance Optimization in Differential Privacy: A Game-Theoretic Approach Through Per-Instance Differential Privacy [7.264378254137811]
Differential privacy (DP) can measure privacy loss by observing the changes in the distribution caused by the inclusion of individuals in the target dataset. DP has been prominent in safeguarding datasets in machine learning in industry giants like Apple and Google. We propose per-instance DP (pDP) as a constraint, measuring privacy loss for each data instance and optimizing noise tailored to individual instances.
arXiv Detail & Related papers (2024-04-24T06:51:16Z)
How Private are DP-SGD Implementations? [61.19794019914523]
We show that there can be a substantial gap between the privacy analysis when using the two types of batch sampling. Our result shows that there can be a substantial gap between the privacy analysis when using the two types of batch sampling.
arXiv Detail & Related papers (2024-03-26T13:02:43Z)
Closed-Form Bounds for DP-SGD against Record-level Inference [18.85865832127335]
We focus on the popular DP-SGD algorithm, and derive simple closed-form bounds. We obtain bounds for membership inference that match state-of-the-art techniques. We present a novel data-dependent bound against attribute inference.
arXiv Detail & Related papers (2024-02-22T09:26:16Z)
Implicit Bias in Noisy-SGD: With Applications to Differentially Private Training [9.618473763561418]
Training Deep Neural Networks (DNNs) with small batches using Gradient Descent (SGD) yields superior test performance compared to larger batches. DP-SGD, used to ensure differential privacy (DP) in DNNs' training, adds Gaussian noise to the clipped gradients. Surprisingly, large-batch training still results in a significant decrease in performance, which poses an important challenge because strong DP guarantees necessitate the use of massive batches.
arXiv Detail & Related papers (2024-02-13T10:19:33Z)
Private Fine-tuning of Large Language Models with Zeroth-order Optimization [51.19403058739522]
Differentially private gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner. We introduce DP-ZO, a private fine-tuning framework for large language models by privatizing zeroth order optimization methods.
arXiv Detail & Related papers (2024-01-09T03:53:59Z)
Make Landscape Flatter in Differentially Private Federated Learning [69.78485792860333]
We propose a novel DPFL algorithm named DP-FedSAM, which leverages gradient perturbation to mitigate the negative impact of DP. Specifically, DP-FedSAM integrates local flatness models with better stability and weight robustness, which results in the small norm of local updates and robustness to DP noise. Our algorithm achieves state-of-the-art (SOTA) performance compared with existing SOTA baselines in DPFL.
arXiv Detail & Related papers (2023-03-20T16:27:36Z)
Normalized/Clipped SGD with Perturbation for Differentially Private Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data. We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z)
Large Scale Transfer Learning for Differentially Private Image Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy. Private training using DP-SGD protects against leakage by injecting noise into individual example gradients. While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z)
DP-FP: Differentially Private Forward Propagation for Large Models [2.062295244789704]
We show how to mitigate the performance drop by replacing the Differential Private Gradient Descent with a novel DP Forward-Propagation (DP-FP) Our DP-FP achieves an average accuracy of 91.34% with privacy budgets less than 3, representing a 3.81% performance improvement over the state-of-the-art DP-SGD.
arXiv Detail & Related papers (2021-12-29T07:32:29Z)
Smoothed Differential Privacy [55.415581832037084]
Differential privacy (DP) is a widely-accepted and widely-applied notion of privacy based on worst-case analysis. In this paper, we propose a natural extension of DP following the worst average-case idea behind the celebrated smoothed analysis. We prove that any discrete mechanism with sampling procedures is more private than what DP predicts, while many continuous mechanisms with sampling procedures are still non-private under smoothed DP.
arXiv Detail & Related papers (2021-07-04T06:55:45Z)
Fast and Memory Efficient Differentially Private-SGD via JL Projections [29.37156662314245]
DP-SGD is the only known algorithm for private training of large scale neural networks. We present a new framework to design differentially privates called DP-SGD-JL and DP-Adam-JL.
arXiv Detail & Related papers (2021-02-05T06:02:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.