DP-FedAdamW: An Efficient Optimizer for Differentially Private Federated Large Models
- URL: http://arxiv.org/abs/2602.19945v1
- Date: Mon, 23 Feb 2026 15:15:47 GMT
- Title: DP-FedAdamW: An Efficient Optimizer for Differentially Private Federated Large Models
- Authors: Jin Liu, Yinbin Miao, Ning Xi, Junkang Liu,
- Abstract summary: We propose DPFedAdamW, the first AdamW-based Differential Privacy-fitting for Differentially Private FL (DPFL)<n>It restores AdamW under DP by stabilizing second-moment variance, removing DP-induced bias, and aligning local updates to the global descent to curb client drift.<n>Our empirical results demonstrate the effectiveness of DPFedAdamW across language and vision Transformers and ResNet-18.
- Score: 14.755143405057929
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Balancing convergence efficiency and robustness under Differential Privacy (DP) is a central challenge in Federated Learning (FL). While AdamW accelerates training and fine-tuning in large-scale models, we find that directly applying it to Differentially Private FL (DPFL) suffers from three major issues: (i) data heterogeneity and privacy noise jointly amplify the variance of second-moment estimator, (ii) DP perturbations bias the second-moment estimator, and (iii) DP amplify AdamW sensitivity to local overfitting, worsening client drift. We propose DP-FedAdamW, the first AdamW-based optimizer for DPFL. It restores AdamW under DP by stabilizing second-moment variance, removing DP-induced bias, and aligning local updates to the global descent to curb client drift. Theoretically, we establish an unbiased second-moment estimator and prove a linearly accelerated convergence rate without any heterogeneity assumption, while providing tighter $(\varepsilon,δ)$-DP guarantees. Our empirical results demonstrate the effectiveness of DP-FedAdamW across language and vision Transformers and ResNet-18. On Tiny-ImageNet (Swin-Base, $\varepsilon=1$), DP-FedAdamW outperforms the state-of-the-art (SOTA) by 5.83\%. The code is available in Appendix.
Related papers
- DP-AdamW: Investigating Decoupled Weight Decay and Bias Correction in Private Deep Learning [0.0]
differential privacy offers formal guarantees to protect against information leakage during model training.<n>Recent advances introduced ever more efficients, with AdamW being a popular choice for training deep learning models because of strong empirical performance.<n>We find that DP-AdamW outperforms existing state-of-the-art differentially privates like DP-SGD, DP-Adam, and DP-AdamBC, scoring over 15% higher on text classification, up to 5% higher on image classification, and consistently 1% higher on graph node classification.
arXiv Detail & Related papers (2025-11-11T05:24:30Z) - Accelerating Differentially Private Federated Learning via Adaptive Extrapolation [2.6108066206600555]
We propose DP-FedEXP, which adaptively selects the global step size based on the diversity of the local updates.<n>We show that DP-FedEXP provably accelerates the convergence of DP-FedAvg and it empirically outperforms existing methods tailored for DP-FL.
arXiv Detail & Related papers (2025-04-14T03:43:27Z) - Towards Communication-efficient Federated Learning via Sparse and Aligned Adaptive Optimization [90.08459757321405]
Federated Adam (FedAdam) algorithms suffer from a threefold increase in uplink communication overhead.<n>We propose a novel sparse FedAdam algorithm called FedAdam-SSM, wherein distributed devices sparsify the updates local model parameters and moment estimates.<n>By minimizing the divergence bound between the model trained by FedAdam-SSM and centralized Adam, we optimize the SSM to mitigate the learning performance degradation caused by sparsification error.
arXiv Detail & Related papers (2024-05-28T07:56:49Z) - Pre-training Differentially Private Models with Limited Public Data [54.943023722114134]
differential privacy (DP) is a prominent method to gauge the degree of security provided to the models.
DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training stage.
We develop a novel DP continual pre-training strategy using only 10% of public data.
Our strategy can achieve DP accuracy of 41.5% on ImageNet-21k, as well as non-DP accuracy of 55.7% and and 60.0% on downstream tasks Places365 and iNaturalist-2021.
arXiv Detail & Related papers (2024-02-28T23:26:27Z) - DP-AdamBC: Your DP-Adam Is Actually DP-SGD (Unless You Apply Bias
Correction) [0.0]
We propose DP-AdamBC, an optimization algorithm which removes the bias in the second moment estimation and retrieves the expected behaviour of Adam.
DP-AdamBC significantly improves the optimization performance of DP-Adam by up 3.5% in final accuracy in image, text, and graph node classification tasks.
arXiv Detail & Related papers (2023-12-21T23:42:00Z) - Towards the Flatter Landscape and Better Generalization in Federated
Learning under Client-level Differential Privacy [67.33715954653098]
We propose a novel DPFL algorithm named DP-FedSAM, which leverages gradient perturbation to mitigate the negative impact of DP.
Specifically, DP-FedSAM integrates Sharpness Aware of Minimization (SAM) to generate local flatness models with stability and weight robustness.
To further reduce the magnitude random noise while achieving better performance, we propose DP-FedSAM-$top_k$ by adopting the local update sparsification technique.
arXiv Detail & Related papers (2023-05-01T15:19:09Z) - DP-Adam: Correcting DP Bias in Adam's Second Moment Estimation [0.0]
We observe that the traditional use of DP with the Adam introduces a bias in the second moment estimation, due to the addition of independent noise in the gradient computation.
This bias leads to a different scaling for low variance parameter updates, that is inconsistent with the behavior of non-private Adam, and Adam's sign descent interpretation.
arXiv Detail & Related papers (2023-04-21T18:43:37Z) - Make Landscape Flatter in Differentially Private Federated Learning [69.78485792860333]
We propose a novel DPFL algorithm named DP-FedSAM, which leverages gradient perturbation to mitigate the negative impact of DP.
Specifically, DP-FedSAM integrates local flatness models with better stability and weight robustness, which results in the small norm of local updates and robustness to DP noise.
Our algorithm achieves state-of-the-art (SOTA) performance compared with existing SOTA baselines in DPFL.
arXiv Detail & Related papers (2023-03-20T16:27:36Z) - Normalized/Clipped SGD with Perturbation for Differentially Private
Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data.
We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z) - DP-FP: Differentially Private Forward Propagation for Large Models [2.062295244789704]
We show how to mitigate the performance drop by replacing the Differential Private Gradient Descent with a novel DP Forward-Propagation (DP-FP)
Our DP-FP achieves an average accuracy of 91.34% with privacy budgets less than 3, representing a 3.81% performance improvement over the state-of-the-art DP-SGD.
arXiv Detail & Related papers (2021-12-29T07:32:29Z) - On the Practicality of Differential Privacy in Federated Learning by
Tuning Iteration Times [51.61278695776151]
Federated Learning (FL) is well known for its privacy protection when training machine learning models among distributed clients collaboratively.
Recent studies have pointed out that the naive FL is susceptible to gradient leakage attacks.
Differential Privacy (DP) emerges as a promising countermeasure to defend against gradient leakage attacks.
arXiv Detail & Related papers (2021-01-11T19:43:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.