Related papers: Differentially Private Learning with Per-Sample Adaptive Clipping

Differentially Private Learning with Per-Sample Adaptive Clipping

URL: http://arxiv.org/abs/2212.00328v3
Date: Tue, 2 May 2023 04:35:26 GMT
Title: Differentially Private Learning with Per-Sample Adaptive Clipping
Authors: Tianyu Xia and Shuheng Shen and Su Yao and Xinyi Fu and Ke Xu and Xiaolong Xu and Xing Fu
Abstract summary: We propose a Differentially Private Per-Sample Adaptive Clipping (DP-PSAC) algorithm based on a non-monotonic adaptive weight function. We show that DP-PSAC outperforms or matches the state-of-the-art methods on multiple main-stream vision and language tasks.
Score: 8.401653565794353
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Privacy in AI remains a topic that draws attention from researchers and the general public in recent years. As one way to implement privacy-preserving AI, differentially private learning is a framework that enables AI models to use differential privacy (DP). To achieve DP in the learning process, existing algorithms typically limit the magnitude of gradients with a constant clipping, which requires carefully tuned due to its significant impact on model performance. As a solution to this issue, latest works NSGD and Auto-S innovatively propose to use normalization instead of clipping to avoid hyperparameter tuning. However, normalization-based approaches like NSGD and Auto-S rely on a monotonic weight function, which imposes excessive weight on small gradient samples and introduces extra deviation to the update. In this paper, we propose a Differentially Private Per-Sample Adaptive Clipping (DP-PSAC) algorithm based on a non-monotonic adaptive weight function, which guarantees privacy without the typical hyperparameter tuning process of using a constant clipping while significantly reducing the deviation between the update and true batch-averaged gradient. We provide a rigorous theoretical convergence analysis and show that with convergence rate at the same order, the proposed algorithm achieves a lower non-vanishing bound, which is maintained over training iterations, compared with NSGD/Auto-S. In addition, through extensive experimental evaluation, we show that DP-PSAC outperforms or matches the state-of-the-art methods on multiple main-stream vision and language tasks.

Related papers

An Optimization Framework for Differentially Private Sparse Fine-Tuning [24.545715091775488]
Differentially private gradient descent (DP-SGD) is broadly considered to be the gold standard for training and fine-tuning neural networks under differential privacy (DP) Recent work has shown that privately fine-tuning only a small subset of model weights and keeping the rest of the weights fixed can lead to better performance. In this work, we propose a new approach for sparse fine-tuning of neural networks under DP.
arXiv Detail & Related papers (2025-03-17T05:05:05Z)
Smoothed Normalization for Efficient Distributed Private Optimization [54.197255548244705]
Federated learning enables machine learning models with privacy of participants. There is no differentially private distributed method for training, non-feedback problems. We introduce a new distributed algorithm $alpha$-$sf NormEC$ with provable convergence guarantees.
arXiv Detail & Related papers (2025-02-19T07:10:32Z)
Linear-Time User-Level DP-SCO via Robust Statistics [55.350093142673316]
User-level differentially private convex optimization (DP-SCO) has garnered significant attention due to the importance of safeguarding user privacy in machine learning applications. Current methods, such as those based on differentially private gradient descent (DP-SGD), often struggle with high noise accumulation and suboptimal utility. We introduce a novel linear-time algorithm that leverages robust statistics, specifically the median and trimmed mean, to overcome these challenges.
arXiv Detail & Related papers (2025-02-13T02:05:45Z)
On the Convergence of DP-SGD with Adaptive Clipping [56.24689348875711]
Gradient Descent with gradient clipping is a powerful technique for enabling differentially private optimization. This paper provides the first comprehensive convergence analysis of SGD with quantile clipping (QC-SGD) We show how QC-SGD suffers from a bias problem similar to constant-threshold clipped SGD but can be mitigated through a carefully designed quantile and step size schedule.
arXiv Detail & Related papers (2024-12-27T20:29:47Z)
Enhancing DP-SGD through Non-monotonous Adaptive Scaling Gradient Weight [15.139854970044075]
We introduce Differentially Private Per-sample Adaptive Scaling Clipping (DP-PSASC) This approach replaces traditional clipping with non-monotonous adaptive gradient scaling. Our theoretical and empirical analyses confirm that DP-PSASC preserves gradient privacy and delivers superior performance across diverse datasets.
arXiv Detail & Related papers (2024-11-05T12:47:30Z)
DiSK: Differentially Private Optimizer with Simplified Kalman Filter for Noise Reduction [57.83978915843095]
This paper introduces DiSK, a novel framework designed to significantly enhance the performance of differentially private gradients. To ensure practicality for large-scale training, we simplify the Kalman filtering process, minimizing its memory and computational demands.
arXiv Detail & Related papers (2024-10-04T19:30:39Z)
Towards Continual Learning Desiderata via HSIC-Bottleneck Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion. Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z)
Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach [62.000948039914135]
Using Differentially Private Gradient Descent with Gradient Clipping (DPSGD-GC) to ensure Differential Privacy (DP) comes at the cost of model performance degradation. We propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC. We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R'enyi DP.
arXiv Detail & Related papers (2023-11-24T17:56:44Z)
DP-SGD with weight clipping [1.0878040851638]
We present a novel approach that mitigates the bias arising from traditional gradient clipping. By leveraging a public upper bound of the Lipschitz value of the current model and its current location within the search domain, we can achieve refined noise level adjustments.
arXiv Detail & Related papers (2023-10-27T09:17:15Z)
The importance of feature preprocessing for differentially private linear optimization [38.125699428109826]
One of the most popular algorithms for training differentially private models is differentially private gradient descent (DPSGD) We show that even for the simple case of linear classification, unlike non-private optimization, (private) feature preprocessing is vital for differentially private optimization. We propose an algorithm called DPSGDF, which combines DPSGD with feature preprocessing and prove that for classification tasks, it incurs an optimality gap proportional to the diameter of the features.
arXiv Detail & Related papers (2023-07-19T20:20:52Z)
Normalized/Clipped SGD with Perturbation for Differentially Private Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data. We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z)
Automatic Clipping: Differentially Private Deep Learning Made Easier and Stronger [39.93710312222771]
Per-example clipping is a key algorithmic step that enables practical differential private (DP) training for deep learning models. We propose an easy-to-use replacement, called automatic clipping, that eliminates the need to tune R for any DPs.
arXiv Detail & Related papers (2022-06-14T19:49:44Z)
Adaptive Differentially Private Empirical Risk Minimization [95.04948014513226]
We propose an adaptive (stochastic) gradient perturbation method for differentially private empirical risk minimization. We prove that the ADP method considerably improves the utility guarantee compared to the standard differentially private method in which vanilla random noise is added.
arXiv Detail & Related papers (2021-10-14T15:02:20Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.