DP-MicroAdam: Private and Frugal Algorithm for Training and Fine-tuning
- URL: http://arxiv.org/abs/2511.20509v2
- Date: Fri, 28 Nov 2025 12:24:09 GMT
- Title: DP-MicroAdam: Private and Frugal Algorithm for Training and Fine-tuning
- Authors: Mihaela Hudişteanu, Nikita P. Kalinin, Edwige Cyffers,
- Abstract summary: Adaptives are the de facto standard in non-private training as they often enable faster convergence and improved performance.<n>In contrast, differentially private training is still predominantly performed with DP-SGD, typically.
- Score: 7.445350484328613
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adaptive optimizers are the de facto standard in non-private training as they often enable faster convergence and improved performance. In contrast, differentially private (DP) training is still predominantly performed with DP-SGD, typically requiring extensive compute and hyperparameter tuning. We propose DP-MicroAdam, a memory-efficient and sparsity-aware adaptive DP optimizer. We prove that DP-MicroAdam converges in stochastic non-convex optimization at the optimal $\mathcal{O}(1/\sqrt{T})$ rate, up to privacy-dependent constants. Empirically, DP-MicroAdam outperforms existing adaptive DP optimizers and achieves competitive or superior accuracy compared to DP-SGD across a range of benchmarks, including CIFAR-10, large-scale ImageNet training, and private fine-tuning of pretrained transformers. These results demonstrate that adaptive optimization can improve both performance and stability under differential privacy.
Related papers
- Adaptive Methods Are Preferable in High Privacy Settings: An SDE Perspective [42.70658101277954]
Differential Privacy (DP) is becoming central to large-scale training as privacy regulations tighten.<n>We revisit how noise interacts with adaptivity in optimization through the lens of differential equations.<n>We show that DP-SGD converges at a Privacy-Utility Trade-Off of $mathcalO (1/varepsilon2)$ with speed independent of $varepsilon$, while DP-SignSGD converges at a speed linear in $varepsilon$ with speed independent of $varepsilon$.
arXiv Detail & Related papers (2026-03-03T18:17:57Z) - DP-FEDSOFIM: Differentially Private Federated Stochastic Optimization using Regularized Fisher Information Matrix [0.0611737116137921]
Differentially private federated learning (DP-FL) suffers from slow convergence under tight privacy budgets due to the overwhelming noise introduced to preserve privacy.<n>We propose DP-FedSOFIM, a server-side second-order optimization framework that leverages the Fisher Information Matrix (FIM) as a natural preconditioner while requiring only O(d) memory per client.<n>Our analysis proves that the server-side preconditioning preserves (epsilon, delta)-differential privacy through the post-processing theorem.
arXiv Detail & Related papers (2026-01-14T05:11:28Z) - DP-AdamW: Investigating Decoupled Weight Decay and Bias Correction in Private Deep Learning [0.0]
differential privacy offers formal guarantees to protect against information leakage during model training.<n>Recent advances introduced ever more efficients, with AdamW being a popular choice for training deep learning models because of strong empirical performance.<n>We find that DP-AdamW outperforms existing state-of-the-art differentially privates like DP-SGD, DP-Adam, and DP-AdamBC, scoring over 15% higher on text classification, up to 5% higher on image classification, and consistently 1% higher on graph node classification.
arXiv Detail & Related papers (2025-11-11T05:24:30Z) - DiSK: Differentially Private Optimizer with Simplified Kalman Filter for Noise Reduction [57.83978915843095]
This paper introduces DiSK, a novel framework designed to significantly enhance the performance of differentially private gradients.<n>To ensure practicality for large-scale training, we simplify the Kalman filtering process, minimizing its memory and computational demands.
arXiv Detail & Related papers (2024-10-04T19:30:39Z) - Pre-training Differentially Private Models with Limited Public Data [54.943023722114134]
differential privacy (DP) is a prominent method to gauge the degree of security provided to the models.
DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training stage.
We develop a novel DP continual pre-training strategy using only 10% of public data.
Our strategy can achieve DP accuracy of 41.5% on ImageNet-21k, as well as non-DP accuracy of 55.7% and and 60.0% on downstream tasks Places365 and iNaturalist-2021.
arXiv Detail & Related papers (2024-02-28T23:26:27Z) - MADA: Meta-Adaptive Optimizers through hyper-gradient Descent [73.1383658672682]
We introduce Meta-Adaptives (MADA), a unified framework that can generalize several known convergences and dynamically learn the most suitable one during training.
We empirically compare MADA to other populars on vision and language tasks, and find that MADA consistently outperforms Adam and other populars.
We also propose AVGrad, a modification of AMS that replaces the maximum operator with averaging, which is more suitable for hyper-gradient optimization.
arXiv Detail & Related papers (2024-01-17T00:16:46Z) - Private Fine-tuning of Large Language Models with Zeroth-order Optimization [51.19403058739522]
Differentially private gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner.<n>We introduce DP-ZO, a private fine-tuning framework for large language models by privatizing zeroth order optimization methods.
arXiv Detail & Related papers (2024-01-09T03:53:59Z) - Automatic Clipping: Differentially Private Deep Learning Made Easier and
Stronger [39.93710312222771]
Per-example clipping is a key algorithmic step that enables practical differential private (DP) training for deep learning models.
We propose an easy-to-use replacement, called automatic clipping, that eliminates the need to tune R for any DPs.
arXiv Detail & Related papers (2022-06-14T19:49:44Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - Large Language Models Can Be Strong Differentially Private Learners [70.0317718115406]
Differentially Private (DP) learning has seen limited success for building large deep learning models of text.
We show that this performance drop can be mitigated with the use of large pretrained models.
We propose a memory saving technique that allows clipping in DP-SGD to run without instantiating per-example gradients.
arXiv Detail & Related papers (2021-10-12T01:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.