On Optimal Hyperparameters for Differentially Private Deep Transfer Learning
- URL: http://arxiv.org/abs/2510.20616v1
- Date: Thu, 23 Oct 2025 14:48:03 GMT
- Title: On Optimal Hyperparameters for Differentially Private Deep Transfer Learning
- Authors: Aki Rehn, Linzh Zhao, Mikko A. Heikkilä, Antti Honkela,
- Abstract summary: We show a clear mismatch between the current theoretical understanding of how to choose an optimal $C$ and empirical outcomes.<n>We highlight how the common practice of using a single $(C,B)$ setting across tasks can lead to suboptimal performance.
- Score: 4.107181236953139
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Differentially private (DP) transfer learning, i.e., fine-tuning a pretrained model on private data, is the current state-of-the-art approach for training large models under privacy constraints. We focus on two key hyperparameters in this setting: the clipping bound $C$ and batch size $B$. We show a clear mismatch between the current theoretical understanding of how to choose an optimal $C$ (stronger privacy requires smaller $C$) and empirical outcomes (larger $C$ performs better under strong privacy), caused by changes in the gradient distributions. Assuming a limited compute budget (fixed epochs), we demonstrate that the existing heuristics for tuning $B$ do not work, while cumulative DP noise better explains whether smaller or larger batches perform better. We also highlight how the common practice of using a single $(C,B)$ setting across tasks can lead to suboptimal performance. We find that performance drops especially when moving between loose and tight privacy and between plentiful and limited compute, which we explain by analyzing clipping as a form of gradient re-weighting and examining cumulative DP noise.
Related papers
- Fundamental Limitations of Favorable Privacy-Utility Guarantees for DP-SGD [7.787109481104569]
We analyze Differentially Private Gradient Descent (DP-SGD) in the $f$differential privacy framework.<n>We prove that enforcing a small separation imposes a strict lower bound on the noise multiplier $$, which directly limits the achievable utility.<n>Our experiments confirm that the noise levels implied by this bound leads to significant accuracy degradation at realistic training settings.
arXiv Detail & Related papers (2026-01-15T09:50:36Z) - Smoothed Normalization for Efficient Distributed Private Optimization [54.197255548244705]
Federated learning enables machine learning models with privacy of participants.<n>There is no differentially private distributed method for training, non-feedback problems.<n>We introduce a new distributed algorithm $alpha$-$sf NormEC$ with provable convergence guarantees.
arXiv Detail & Related papers (2025-02-19T07:10:32Z) - Optimized Tradeoffs for Private Prediction with Majority Ensembling [59.99331405291337]
We introduce the Data-dependent Randomized Response Majority (DaRRM) algorithm.<n>DaRRM is parameterized by a data-dependent noise function $gamma$, and enables efficient utility optimization over the class of all private algorithms.<n>We show that DaRRM provably enjoys a privacy gain of a factor of 2 over common baselines, with fixed utility.
arXiv Detail & Related papers (2024-11-27T00:48:48Z) - Privacy for Free in the Overparameterized Regime [19.261178173399784]
Differentially private gradient descent (DP-GD) is a popular algorithm to train deep learning models with provable guarantees on the privacy of the training data.<n>In this work, we show that in the popular random features model with quadratic loss, for any sufficiently large $p$, privacy can be obtained for free, i.e., $left|R_P right| = o(1)$, not only when the privacy parameter $varepsilon$ has constant order, but also in the strongly private setting $varepsilon = o(1)$.
arXiv Detail & Related papers (2024-10-18T18:01:11Z) - TAN Without a Burn: Scaling Laws of DP-SGD [70.7364032297978]
Differentially Private methods for training Deep Neural Networks (DNNs) have progressed recently.
We decouple privacy analysis and experimental behavior of noisy training to explore the trade-off with minimal computational requirements.
We apply the proposed method on CIFAR-10 and ImageNet and, in particular, strongly improve the state-of-the-art on ImageNet with a +9 points gain in top-1 accuracy.
arXiv Detail & Related papers (2022-10-07T08:44:35Z) - Normalized/Clipped SGD with Perturbation for Differentially Private
Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data.
We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z) - Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD.
We find that most examples enjoy stronger privacy guarantees than the worst-case bound.
This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - The Fundamental Price of Secure Aggregation in Differentially Private
Federated Learning [34.630300910399036]
We characterize the fundamental communication cost required to obtain the best accuracy under $varepsilon$ central DP.
Our results show that $tildeOleft( min(n2varepsilon2, d) right)$ bits per client are both sufficient and necessary.
This provides a significant improvement relative to state-of-the-art SecAgg distributed DP schemes.
arXiv Detail & Related papers (2022-03-07T22:56:09Z) - Learning with User-Level Privacy [61.62978104304273]
We analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints.
Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution.
We derive an algorithm that privately answers a sequence of $K$ adaptively chosen queries with privacy cost proportional to $tau$, and apply it to solve the learning tasks we consider.
arXiv Detail & Related papers (2021-02-23T18:25:13Z) - Output Perturbation for Differentially Private Convex Optimization with
Improved Population Loss Bounds, Runtimes and Applications to Private
Adversarial Training [12.386462516398469]
Finding efficient, easily implementable differentially private (DP) algorithms that offer strong excess risk bounds is an important problem in modern machine learning.
We provide the tightest known $(epsilon, 0)$-DP population loss bounds and fastest runtimes under the presence of smoothness and strong convexity.
We apply our theory to two learning frameworks: tilted ERM and adversarial learning frameworks.
arXiv Detail & Related papers (2021-02-09T08:47:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.