Sample-Efficient Differentially Private Fine-Tuning via Gradient Matrix Denoising
- URL: http://arxiv.org/abs/2510.01137v1
- Date: Wed, 01 Oct 2025 17:25:23 GMT
- Title: Sample-Efficient Differentially Private Fine-Tuning via Gradient Matrix Denoising
- Authors: Ali Dadsetan, Frank Rudzicz,
- Abstract summary: We address the challenge of sample efficiency in differentially private fine-tuning of large language models (LLMs) using DP-SGD.<n>While DP-SGD provides strong privacy guarantees, the added noise significantly increases the entropy of gradient matrices.<n>We propose a post-processing algorithm that leverages random matrix theory to denoise gradients, restore low-rank structure, and improve alignment with the original signal.
- Score: 15.20112773687652
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We address the challenge of sample efficiency in differentially private fine-tuning of large language models (LLMs) using DP-SGD. While DP-SGD provides strong privacy guarantees, the added noise significantly increases the entropy of gradient matrices, disrupting their low-rank structure and slowing optimization. We propose a post-processing algorithm that leverages random matrix theory to denoise gradients, restore low-rank structure, and improve alignment with the original signal. Applied to DP-SGD fine-tuning of RoBERTa on GLUE tasks, our method improves sample efficiency compared to state-of-the-art approaches, substantially reducing training time when optimal performance is not required. This work demonstrates that matrix recovery techniques can enhance the utility of private language model training without compromising privacy guarantees.
Related papers
- Bayesian Natural Gradient Fine-Tuning of CLIP Models via Kalman Filtering [4.681301898136104]
Few fine-tuning is a major challenge to achieve optimal performance in vision-language pre-trained models.<n>We propose a Bayesian approximation of Natural Descent (NGD) using a Kalman filter for CLIP models.<n>Our algorithm consistently achieves superior--or comparable--ID performance compared to state-of-the-art baselines.
arXiv Detail & Related papers (2025-11-03T16:00:45Z) - Linear-Time User-Level DP-SCO via Robust Statistics [55.350093142673316]
User-level differentially private convex optimization (DP-SCO) has garnered significant attention due to the importance of safeguarding user privacy in machine learning applications.<n>Current methods, such as those based on differentially private gradient descent (DP-SGD), often struggle with high noise accumulation and suboptimal utility.<n>We introduce a novel linear-time algorithm that leverages robust statistics, specifically the median and trimmed mean, to overcome these challenges.
arXiv Detail & Related papers (2025-02-13T02:05:45Z) - Privacy without Noisy Gradients: Slicing Mechanism for Generative Model Training [10.229653770070202]
Training generative models with differential privacy (DP) typically involves injecting noise into gradient updates or adapting the discriminator's training procedure.
We consider the slicing privacy mechanism that injects noise into random low-dimensional projections of the private data.
We present a kernel-based estimator for this divergence, circumventing the need for adversarial training.
arXiv Detail & Related papers (2024-10-25T19:32:58Z) - DiSK: Differentially Private Optimizer with Simplified Kalman Filter for Noise Reduction [57.83978915843095]
This paper introduces DiSK, a novel framework designed to significantly enhance the performance of differentially private gradients.<n>To ensure practicality for large-scale training, we simplify the Kalman filtering process, minimizing its memory and computational demands.
arXiv Detail & Related papers (2024-10-04T19:30:39Z) - Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach [62.000948039914135]
Using Differentially Private Gradient Descent with Gradient Clipping (DPSGD-GC) to ensure Differential Privacy (DP) comes at the cost of model performance degradation.
We propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC.
We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R'enyi DP.
arXiv Detail & Related papers (2023-11-24T17:56:44Z) - Towards the Flatter Landscape and Better Generalization in Federated
Learning under Client-level Differential Privacy [67.33715954653098]
We propose a novel DPFL algorithm named DP-FedSAM, which leverages gradient perturbation to mitigate the negative impact of DP.
Specifically, DP-FedSAM integrates Sharpness Aware of Minimization (SAM) to generate local flatness models with stability and weight robustness.
To further reduce the magnitude random noise while achieving better performance, we propose DP-FedSAM-$top_k$ by adopting the local update sparsification technique.
arXiv Detail & Related papers (2023-05-01T15:19:09Z) - Differentially Private Learning with Per-Sample Adaptive Clipping [8.401653565794353]
We propose a Differentially Private Per-Sample Adaptive Clipping (DP-PSAC) algorithm based on a non-monotonic adaptive weight function.
We show that DP-PSAC outperforms or matches the state-of-the-art methods on multiple main-stream vision and language tasks.
arXiv Detail & Related papers (2022-12-01T07:26:49Z) - DPIS: An Enhanced Mechanism for Differentially Private SGD with Importance Sampling [23.8561225168394]
differential privacy (DP) has become a well-accepted standard for privacy protection, and deep neural networks (DNN) have been immensely successful in machine learning.
A classic mechanism for this purpose is DP-SGD, which is a differentially private version of the gradient descent (SGD) commonly used for training.
We propose DPIS, a novel mechanism for differentially private SGD training that can be used as a drop-in replacement of the core of DP-SGD.
arXiv Detail & Related papers (2022-10-18T07:03:14Z) - Adaptive Differentially Private Empirical Risk Minimization [95.04948014513226]
We propose an adaptive (stochastic) gradient perturbation method for differentially private empirical risk minimization.
We prove that the ADP method considerably improves the utility guarantee compared to the standard differentially private method in which vanilla random noise is added.
arXiv Detail & Related papers (2021-10-14T15:02:20Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.