Differentially Private Learning with Per-Sample Adaptive Clipping
- URL: http://arxiv.org/abs/2212.00328v3
- Date: Tue, 2 May 2023 04:35:26 GMT
- Title: Differentially Private Learning with Per-Sample Adaptive Clipping
- Authors: Tianyu Xia and Shuheng Shen and Su Yao and Xinyi Fu and Ke Xu and
Xiaolong Xu and Xing Fu
- Abstract summary: We propose a Differentially Private Per-Sample Adaptive Clipping (DP-PSAC) algorithm based on a non-monotonic adaptive weight function.
We show that DP-PSAC outperforms or matches the state-of-the-art methods on multiple main-stream vision and language tasks.
- Score: 8.401653565794353
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Privacy in AI remains a topic that draws attention from researchers and the
general public in recent years. As one way to implement privacy-preserving AI,
differentially private learning is a framework that enables AI models to use
differential privacy (DP). To achieve DP in the learning process, existing
algorithms typically limit the magnitude of gradients with a constant clipping,
which requires carefully tuned due to its significant impact on model
performance. As a solution to this issue, latest works NSGD and Auto-S
innovatively propose to use normalization instead of clipping to avoid
hyperparameter tuning. However, normalization-based approaches like NSGD and
Auto-S rely on a monotonic weight function, which imposes excessive weight on
small gradient samples and introduces extra deviation to the update. In this
paper, we propose a Differentially Private Per-Sample Adaptive Clipping
(DP-PSAC) algorithm based on a non-monotonic adaptive weight function, which
guarantees privacy without the typical hyperparameter tuning process of using a
constant clipping while significantly reducing the deviation between the update
and true batch-averaged gradient. We provide a rigorous theoretical convergence
analysis and show that with convergence rate at the same order, the proposed
algorithm achieves a lower non-vanishing bound, which is maintained over
training iterations, compared with NSGD/Auto-S. In addition, through extensive
experimental evaluation, we show that DP-PSAC outperforms or matches the
state-of-the-art methods on multiple main-stream vision and language tasks.
Related papers
- Enhancing DP-SGD through Non-monotonous Adaptive Scaling Gradient Weight [15.139854970044075]
We introduce Differentially Private Per-sample Adaptive Scaling Clipping (DP-PSASC)
This approach replaces traditional clipping with non-monotonous adaptive gradient scaling.
Our theoretical and empirical analyses confirm that DP-PSASC preserves gradient privacy and delivers superior performance across diverse datasets.
arXiv Detail & Related papers (2024-11-05T12:47:30Z) - DiSK: Differentially Private Optimizer with Simplified Kalman Filter for Noise Reduction [57.83978915843095]
This paper introduces DiSK, a novel framework designed to significantly enhance the performance of differentially private gradients.
To ensure practicality for large-scale training, we simplify the Kalman filtering process, minimizing its memory and computational demands.
arXiv Detail & Related papers (2024-10-04T19:30:39Z) - Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach [62.000948039914135]
Using Differentially Private Gradient Descent with Gradient Clipping (DPSGD-GC) to ensure Differential Privacy (DP) comes at the cost of model performance degradation.
We propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC.
We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R'enyi DP.
arXiv Detail & Related papers (2023-11-24T17:56:44Z) - DP-SGD with weight clipping [1.0878040851638]
We present a novel approach that mitigates the bias arising from traditional gradient clipping.
By leveraging a public upper bound of the Lipschitz value of the current model and its current location within the search domain, we can achieve refined noise level adjustments.
arXiv Detail & Related papers (2023-10-27T09:17:15Z) - The importance of feature preprocessing for differentially private
linear optimization [38.125699428109826]
One of the most popular algorithms for training differentially private models is differentially private gradient descent (DPSGD)
We show that even for the simple case of linear classification, unlike non-private optimization, (private) feature preprocessing is vital for differentially private optimization.
We propose an algorithm called DPSGDF, which combines DPSGD with feature preprocessing and prove that for classification tasks, it incurs an optimality gap proportional to the diameter of the features.
arXiv Detail & Related papers (2023-07-19T20:20:52Z) - Normalized/Clipped SGD with Perturbation for Differentially Private
Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data.
We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z) - Automatic Clipping: Differentially Private Deep Learning Made Easier and
Stronger [39.93710312222771]
Per-example clipping is a key algorithmic step that enables practical differential private (DP) training for deep learning models.
We propose an easy-to-use replacement, called automatic clipping, that eliminates the need to tune R for any DPs.
arXiv Detail & Related papers (2022-06-14T19:49:44Z) - Adaptive Differentially Private Empirical Risk Minimization [95.04948014513226]
We propose an adaptive (stochastic) gradient perturbation method for differentially private empirical risk minimization.
We prove that the ADP method considerably improves the utility guarantee compared to the standard differentially private method in which vanilla random noise is added.
arXiv Detail & Related papers (2021-10-14T15:02:20Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.