Related papers: DP-AdamW: Investigating Decoupled Weight Decay and Bias Correction in Private Deep Learning

DP-AdamW: Investigating Decoupled Weight Decay and Bias Correction in Private Deep Learning

URL: http://arxiv.org/abs/2511.07843v1
Date: Wed, 12 Nov 2025 01:23:44 GMT
Title: DP-AdamW: Investigating Decoupled Weight Decay and Bias Correction in Private Deep Learning
Authors: Jay Chooi, Kevin Cong, Russell Li, Lillian Sun,
Abstract summary: differential privacy offers formal guarantees to protect against information leakage during model training.<n>Recent advances introduced ever more efficients, with AdamW being a popular choice for training deep learning models because of strong empirical performance.<n>We find that DP-AdamW outperforms existing state-of-the-art differentially privates like DP-SGD, DP-Adam, and DP-AdamBC, scoring over 15% higher on text classification, up to 5% higher on image classification, and consistently 1% higher on graph node classification.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As deep learning methods increasingly utilize sensitive data on a widespread scale, differential privacy (DP) offers formal guarantees to protect against information leakage during model training. A significant challenge remains in implementing DP optimizers that retain strong performance while preserving privacy. Recent advances introduced ever more efficient optimizers, with AdamW being a popular choice for training deep learning models because of strong empirical performance. We study \emph{DP-AdamW} and introduce \emph{DP-AdamW-BC}, a differentially private variant of the AdamW optimizer with DP bias correction for the second moment estimator. We start by showing theoretical results for privacy and convergence guarantees of DP-AdamW and DP-AdamW-BC. Then, we empirically analyze the behavior of both optimizers across multiple privacy budgets ($ε= 1, 3, 7$). We find that DP-AdamW outperforms existing state-of-the-art differentially private optimizers like DP-SGD, DP-Adam, and DP-AdamBC, scoring over 15\% higher on text classification, up to 5\% higher on image classification, and consistently 1\% higher on graph node classification. Moreover, we empirically show that incorporating bias correction in DP-AdamW (DP-AdamW-BC) consistently decreases accuracy, in contrast to the improvement of DP-AdamBC improvement over DP-Adam.

Related papers

DP-FedAdamW: An Efficient Optimizer for Differentially Private Federated Large Models [14.755143405057929]
We propose DPFedAdamW, the first AdamW-based Differential Privacy-fitting for Differentially Private FL (DPFL)<n>It restores AdamW under DP by stabilizing second-moment variance, removing DP-induced bias, and aligning local updates to the global descent to curb client drift.<n>Our empirical results demonstrate the effectiveness of DPFedAdamW across language and vision Transformers and ResNet-18.
arXiv Detail & Related papers (2026-02-23T15:15:47Z)
DP-MicroAdam: Private and Frugal Algorithm for Training and Fine-tuning [7.445350484328613]
Adaptives are the de facto standard in non-private training as they often enable faster convergence and improved performance.<n>In contrast, differentially private training is still predominantly performed with DP-SGD, typically.
arXiv Detail & Related papers (2025-11-25T17:17:48Z)
DOPPLER: Differentially Private Optimizers with Low-pass Filter for Privacy Noise Reduction [47.65999101635902]
Differentially private (DP) training prevents the leakage of sensitive information in the collected training data from trained machine learning models. We develop a new component, called DOPPLER, which works by effectively amplifying the gradient while DP noise within this frequency domain. Our experiments show that the proposed DPs with a lowpass filter outperform their counterparts without the filter by 3%-10% in test accuracy.
arXiv Detail & Related papers (2024-08-24T04:27:07Z)
Pre-training Differentially Private Models with Limited Public Data [54.943023722114134]
differential privacy (DP) is a prominent method to gauge the degree of security provided to the models. DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training stage. We develop a novel DP continual pre-training strategy using only 10% of public data. Our strategy can achieve DP accuracy of 41.5% on ImageNet-21k, as well as non-DP accuracy of 55.7% and and 60.0% on downstream tasks Places365 and iNaturalist-2021.
arXiv Detail & Related papers (2024-02-28T23:26:27Z)
Private Fine-tuning of Large Language Models with Zeroth-order Optimization [51.19403058739522]
Differentially private gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner.<n>We introduce DP-ZO, a private fine-tuning framework for large language models by privatizing zeroth order optimization methods.
arXiv Detail & Related papers (2024-01-09T03:53:59Z)
DP-AdamBC: Your DP-Adam Is Actually DP-SGD (Unless You Apply Bias Correction) [0.0]
We propose DP-AdamBC, an optimization algorithm which removes the bias in the second moment estimation and retrieves the expected behaviour of Adam. DP-AdamBC significantly improves the optimization performance of DP-Adam by up 3.5% in final accuracy in image, text, and graph node classification tasks.
arXiv Detail & Related papers (2023-12-21T23:42:00Z)
Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach [62.000948039914135]
Using Differentially Private Gradient Descent with Gradient Clipping (DPSGD-GC) to ensure Differential Privacy (DP) comes at the cost of model performance degradation. We propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC. We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R'enyi DP.
arXiv Detail & Related papers (2023-11-24T17:56:44Z)
DP-Adam: Correcting DP Bias in Adam's Second Moment Estimation [0.0]
We observe that the traditional use of DP with the Adam introduces a bias in the second moment estimation, due to the addition of independent noise in the gradient computation. This bias leads to a different scaling for low variance parameter updates, that is inconsistent with the behavior of non-private Adam, and Adam's sign descent interpretation.
arXiv Detail & Related papers (2023-04-21T18:43:37Z)
Differentially Private Diffusion Models [46.46256537222917]
We build on the recent success of diffusion models (DMs) and introduce Differentially Private Diffusion Models (DPDMs) We propose noise multiplicity, a powerful modification of DP-SGD tailored to the training of DMs. We validate our novel DPDMs on image generation benchmarks and achieve state-of-the-art performance in all experiments.
arXiv Detail & Related papers (2022-10-18T15:20:47Z)
Normalized/Clipped SGD with Perturbation for Differentially Private Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data. We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z)
Large Scale Transfer Learning for Differentially Private Image Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy. Private training using DP-SGD protects against leakage by injecting noise into individual example gradients. While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z)
DP-FP: Differentially Private Forward Propagation for Large Models [2.062295244789704]
We show how to mitigate the performance drop by replacing the Differential Private Gradient Descent with a novel DP Forward-Propagation (DP-FP) Our DP-FP achieves an average accuracy of 91.34% with privacy budgets less than 3, representing a 3.81% performance improvement over the state-of-the-art DP-SGD.
arXiv Detail & Related papers (2021-12-29T07:32:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.