Lap2: Revisiting Laplace DP-SGD for High Dimensions via Majorization Theory
- URL: http://arxiv.org/abs/2602.23516v2
- Date: Thu, 05 Mar 2026 14:42:22 GMT
- Title: Lap2: Revisiting Laplace DP-SGD for High Dimensions via Majorization Theory
- Authors: Meisam Mohammady, Qin Yang, Nicholas Stout, Ayesha Samreen, Han Wang, Christopher J Quinn, Yuan Hong,
- Abstract summary: Differentially outperforming Private Gradient Descent (DP-SGD) is a cornerstone technique for ensuring privacy in deep learning.<n>We introduce Lap2, a new solution that enables L2 clipping for Laplace-SGD while overcoming strong privacy guarantees.
- Score: 11.109662190596412
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Differentially Private Stochastic Gradient Descent (DP-SGD) is a cornerstone technique for ensuring privacy in deep learning, widely used in both training from scratch and fine-tuning large-scale language models. While DP-SGD predominantly relies on the Gaussian mechanism, the Laplace mechanism remains underutilized due to its reliance on L1 norm clipping. This constraint severely limits its practicality in high-dimensional models because the L1 norm of an n-dimensional gradient can be up to sqrt(n) times larger than its L2 norm. As a result, the required noise scale grows significantly with model size, leading to poor utility or untrainable models. In this work, we introduce Lap2, a new solution that enables L2 clipping for Laplace DP-SGD while preserving strong privacy guarantees. We overcome the dimensionality-driven clipping barrier by computing coordinate-wise moment bounds and applying majorization theory to construct a tight, data-independent upper bound over the full model. By exploiting the Schur-convexity of the moment accountant function, we aggregate these bounds using a carefully designed majorization set that respects the L2 clipping constraint. This yields a multivariate privacy accountant that scales gracefully with model dimension and enables the use of thousands of moments. Empirical evaluations demonstrate that our approach significantly improves the performance of Laplace DP-SGD, achieving results comparable to or better than Gaussian DP-SGD under strong privacy constraints. For instance, fine-tuning RoBERTa-base (125M parameters) on SST-2 achieves 87.88% accuracy at epsilon=0.54, outperforming Gaussian (87.16%) and standard Laplace (48.97%) under the same budget.
Related papers
- Fundamental Limitations of Favorable Privacy-Utility Guarantees for DP-SGD [7.787109481104569]
We analyze Differentially Private Gradient Descent (DP-SGD) in the $f$differential privacy framework.<n>We prove that enforcing a small separation imposes a strict lower bound on the noise multiplier $$, which directly limits the achievable utility.<n>Our experiments confirm that the noise levels implied by this bound leads to significant accuracy degradation at realistic training settings.
arXiv Detail & Related papers (2026-01-15T09:50:36Z) - LMO-DP: Optimizing the Randomization Mechanism for Differentially Private Fine-Tuning (Large) Language Models [31.718398512438238]
We propose a novel Language Model-based Optimal Differential Privacy (LMO-DP) mechanism.
It takes the first step to enable the tight composition of accurately fine-tuning language models with a sub-optimal DP mechanism.
LMO-DP is also the first solution to accurately fine-tune Llama-2 with strong differential privacy guarantees.
arXiv Detail & Related papers (2024-05-29T05:32:50Z) - Differentially Private Zeroth-Order Methods for Scalable Large Language Model Finetuning [0.0]
differentially private (DP) fine-tuning of pretrained LLMs has been widely used to safeguarding the privacy of task-specific datasets.<n>Despite pushing the scalability of DP-SGD to its limit, DP-SGD-based fine-tuning methods are unfortunately limited by the inherent inefficiency of SGD.
arXiv Detail & Related papers (2024-02-12T17:24:15Z) - Private Fine-tuning of Large Language Models with Zeroth-order Optimization [51.19403058739522]
Differentially private gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner.<n>We introduce DP-ZO, a private fine-tuning framework for large language models by privatizing zeroth order optimization methods.
arXiv Detail & Related papers (2024-01-09T03:53:59Z) - Sparsity-Preserving Differentially Private Training of Large Embedding
Models [67.29926605156788]
DP-SGD is a training algorithm that combines differential privacy with gradient descent.
Applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency.
We present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during private training of large embedding models.
arXiv Detail & Related papers (2023-11-14T17:59:51Z) - Bias-Aware Minimisation: Understanding and Mitigating Estimator Bias in
Private SGD [56.01810892677744]
We show a connection between per-sample gradient norms and the estimation bias of the private gradient oracle used in DP-SGD.
We propose Bias-Aware Minimisation (BAM) that allows for the provable reduction of private gradient estimator bias.
arXiv Detail & Related papers (2023-08-23T09:20:41Z) - High-Dimensional Private Empirical Risk Minimization by Greedy
Coordinate Descent [11.49109939095326]
We study differentially private empirical risk minimization (DP-ERM)
We show theoretically that DP-GCD can achieve a logarithmic dependence on the dimension for a wide range of problems.
arXiv Detail & Related papers (2022-07-04T16:27:00Z) - Normalized/Clipped SGD with Perturbation for Differentially Private
Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data.
We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - Do Not Let Privacy Overbill Utility: Gradient Embedding Perturbation for
Private Learning [74.73901662374921]
A differentially private model degrades the utility drastically when the model comprises a large number of trainable parameters.
We propose an algorithm emphGradient Embedding Perturbation (GEP) towards training differentially private deep models with decent accuracy.
arXiv Detail & Related papers (2021-02-25T04:29:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.