DP-AdamW: Investigating Decoupled Weight Decay and Bias Correction in Private Deep Learning
- URL: http://arxiv.org/abs/2511.07843v1
- Date: Wed, 12 Nov 2025 01:23:44 GMT
- Title: DP-AdamW: Investigating Decoupled Weight Decay and Bias Correction in Private Deep Learning
- Authors: Jay Chooi, Kevin Cong, Russell Li, Lillian Sun,
- Abstract summary: differential privacy offers formal guarantees to protect against information leakage during model training.<n>Recent advances introduced ever more efficients, with AdamW being a popular choice for training deep learning models because of strong empirical performance.<n>We find that DP-AdamW outperforms existing state-of-the-art differentially privates like DP-SGD, DP-Adam, and DP-AdamBC, scoring over 15% higher on text classification, up to 5% higher on image classification, and consistently 1% higher on graph node classification.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As deep learning methods increasingly utilize sensitive data on a widespread scale, differential privacy (DP) offers formal guarantees to protect against information leakage during model training. A significant challenge remains in implementing DP optimizers that retain strong performance while preserving privacy. Recent advances introduced ever more efficient optimizers, with AdamW being a popular choice for training deep learning models because of strong empirical performance. We study \emph{DP-AdamW} and introduce \emph{DP-AdamW-BC}, a differentially private variant of the AdamW optimizer with DP bias correction for the second moment estimator. We start by showing theoretical results for privacy and convergence guarantees of DP-AdamW and DP-AdamW-BC. Then, we empirically analyze the behavior of both optimizers across multiple privacy budgets ($ε= 1, 3, 7$). We find that DP-AdamW outperforms existing state-of-the-art differentially private optimizers like DP-SGD, DP-Adam, and DP-AdamBC, scoring over 15\% higher on text classification, up to 5\% higher on image classification, and consistently 1\% higher on graph node classification. Moreover, we empirically show that incorporating bias correction in DP-AdamW (DP-AdamW-BC) consistently decreases accuracy, in contrast to the improvement of DP-AdamBC improvement over DP-Adam.
Related papers
- DP-FedAdamW: An Efficient Optimizer for Differentially Private Federated Large Models [14.755143405057929]
We propose DPFedAdamW, the first AdamW-based Differential Privacy-fitting for Differentially Private FL (DPFL)<n>It restores AdamW under DP by stabilizing second-moment variance, removing DP-induced bias, and aligning local updates to the global descent to curb client drift.<n>Our empirical results demonstrate the effectiveness of DPFedAdamW across language and vision Transformers and ResNet-18.
arXiv Detail & Related papers (2026-02-23T15:15:47Z) - DP-MicroAdam: Private and Frugal Algorithm for Training and Fine-tuning [7.445350484328613]
Adaptives are the de facto standard in non-private training as they often enable faster convergence and improved performance.<n>In contrast, differentially private training is still predominantly performed with DP-SGD, typically.
arXiv Detail & Related papers (2025-11-25T17:17:48Z) - DOPPLER: Differentially Private Optimizers with Low-pass Filter for Privacy Noise Reduction [47.65999101635902]
Differentially private (DP) training prevents the leakage of sensitive information in the collected training data from trained machine learning models.
We develop a new component, called DOPPLER, which works by effectively amplifying the gradient while DP noise within this frequency domain.
Our experiments show that the proposed DPs with a lowpass filter outperform their counterparts without the filter by 3%-10% in test accuracy.
arXiv Detail & Related papers (2024-08-24T04:27:07Z) - Pre-training Differentially Private Models with Limited Public Data [54.943023722114134]
differential privacy (DP) is a prominent method to gauge the degree of security provided to the models.
DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training stage.
We develop a novel DP continual pre-training strategy using only 10% of public data.
Our strategy can achieve DP accuracy of 41.5% on ImageNet-21k, as well as non-DP accuracy of 55.7% and and 60.0% on downstream tasks Places365 and iNaturalist-2021.
arXiv Detail & Related papers (2024-02-28T23:26:27Z) - Private Fine-tuning of Large Language Models with Zeroth-order Optimization [51.19403058739522]
Differentially private gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner.<n>We introduce DP-ZO, a private fine-tuning framework for large language models by privatizing zeroth order optimization methods.
arXiv Detail & Related papers (2024-01-09T03:53:59Z) - DP-AdamBC: Your DP-Adam Is Actually DP-SGD (Unless You Apply Bias
Correction) [0.0]
We propose DP-AdamBC, an optimization algorithm which removes the bias in the second moment estimation and retrieves the expected behaviour of Adam.
DP-AdamBC significantly improves the optimization performance of DP-Adam by up 3.5% in final accuracy in image, text, and graph node classification tasks.
arXiv Detail & Related papers (2023-12-21T23:42:00Z) - Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach [62.000948039914135]
Using Differentially Private Gradient Descent with Gradient Clipping (DPSGD-GC) to ensure Differential Privacy (DP) comes at the cost of model performance degradation.
We propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC.
We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R'enyi DP.
arXiv Detail & Related papers (2023-11-24T17:56:44Z) - DP-Adam: Correcting DP Bias in Adam's Second Moment Estimation [0.0]
We observe that the traditional use of DP with the Adam introduces a bias in the second moment estimation, due to the addition of independent noise in the gradient computation.
This bias leads to a different scaling for low variance parameter updates, that is inconsistent with the behavior of non-private Adam, and Adam's sign descent interpretation.
arXiv Detail & Related papers (2023-04-21T18:43:37Z) - Differentially Private Diffusion Models [46.46256537222917]
We build on the recent success of diffusion models (DMs) and introduce Differentially Private Diffusion Models (DPDMs)
We propose noise multiplicity, a powerful modification of DP-SGD tailored to the training of DMs.
We validate our novel DPDMs on image generation benchmarks and achieve state-of-the-art performance in all experiments.
arXiv Detail & Related papers (2022-10-18T15:20:47Z) - Normalized/Clipped SGD with Perturbation for Differentially Private
Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data.
We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - DP-FP: Differentially Private Forward Propagation for Large Models [2.062295244789704]
We show how to mitigate the performance drop by replacing the Differential Private Gradient Descent with a novel DP Forward-Propagation (DP-FP)
Our DP-FP achieves an average accuracy of 91.34% with privacy budgets less than 3, representing a 3.81% performance improvement over the state-of-the-art DP-SGD.
arXiv Detail & Related papers (2021-12-29T07:32:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.