Related papers: Bias-Aware Minimisation: Understanding and Mitigating Estimator Bias in Private SGD

Bias-Aware Minimisation: Understanding and Mitigating Estimator Bias in Private SGD

URL: http://arxiv.org/abs/2308.12018v1
Date: Wed, 23 Aug 2023 09:20:41 GMT
Title: Bias-Aware Minimisation: Understanding and Mitigating Estimator Bias in Private SGD
Authors: Moritz Knolle, Robert Dorfman, Alexander Ziller, Daniel Rueckert and Georgios Kaissis
Abstract summary: We show a connection between per-sample gradient norms and the estimation bias of the private gradient oracle used in DP-SGD. We propose Bias-Aware Minimisation (BAM) that allows for the provable reduction of private gradient estimator bias.
Score: 56.01810892677744
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Differentially private SGD (DP-SGD) holds the promise of enabling the safe and responsible application of machine learning to sensitive datasets. However, DP-SGD only provides a biased, noisy estimate of a mini-batch gradient. This renders optimisation steps less effective and limits model utility as a result. With this work, we show a connection between per-sample gradient norms and the estimation bias of the private gradient oracle used in DP-SGD. Here, we propose Bias-Aware Minimisation (BAM) that allows for the provable reduction of private gradient estimator bias. We show how to efficiently compute quantities needed for BAM to scale to large neural networks and highlight similarities to closely related methods such as Sharpness-Aware Minimisation. Finally, we provide empirical evidence that BAM not only reduces bias but also substantially improves privacy-utility trade-offs on the CIFAR-10, CIFAR-100, and ImageNet-32 datasets.

Related papers

Enhancing DP-SGD through Non-monotonous Adaptive Scaling Gradient Weight [15.139854970044075]
We introduce Differentially Private Per-sample Adaptive Scaling Clipping (DP-PSASC) This approach replaces traditional clipping with non-monotonous adaptive gradient scaling. Our theoretical and empirical analyses confirm that DP-PSASC preserves gradient privacy and delivers superior performance across diverse datasets.
arXiv Detail & Related papers (2024-11-05T12:47:30Z)
Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach [62.000948039914135]
Using Differentially Private Gradient Descent with Gradient Clipping (DPSGD-GC) to ensure Differential Privacy (DP) comes at the cost of model performance degradation. We propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC. We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R'enyi DP.
arXiv Detail & Related papers (2023-11-24T17:56:44Z)
Sparsity-Preserving Differentially Private Training of Large Embedding Models [67.29926605156788]
DP-SGD is a training algorithm that combines differential privacy with gradient descent. Applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency. We present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during private training of large embedding models.
arXiv Detail & Related papers (2023-11-14T17:59:51Z)
Normalized/Clipped SGD with Perturbation for Differentially Private Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data. We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z)
Improving Differentially Private SGD via Randomly Sparsified Gradients [31.295035726077366]
Differentially private gradient observation (DP-SGD) has been widely adopted in deep learning to provide rigorously defined privacy bound compression. We propose an and utilize RS to strengthen communication cost and strengthen privacy bound compression.
arXiv Detail & Related papers (2021-12-01T21:43:34Z)
Do Not Let Privacy Overbill Utility: Gradient Embedding Perturbation for Private Learning [74.73901662374921]
A differentially private model degrades the utility drastically when the model comprises a large number of trainable parameters. We propose an algorithm emphGradient Embedding Perturbation (GEP) towards training differentially private deep models with decent accuracy.
arXiv Detail & Related papers (2021-02-25T04:29:58Z)
Bypassing the Ambient Dimension: Private SGD with Gradient Subspace Identification [47.23063195722975]
Differentially private SGD (DP-SGD) is one of the most popular methods for solving differentially private empirical risk minimization (ERM) Due to its noisy perturbation on each gradient update, the error rate of DP-SGD scales with the ambient dimension $p$, the number of parameters in the model. We propose Projected DP-SGD that performs noise reduction by projecting the noisy gradients to a low-dimensional subspace.
arXiv Detail & Related papers (2020-07-07T22:31:01Z)
Unbiased Risk Estimators Can Mislead: A Case Study of Learning with Complementary Labels [92.98756432746482]
We study a weakly supervised problem called learning with complementary labels. We show that the quality of gradient estimation matters more in risk minimization. We propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance.
arXiv Detail & Related papers (2020-07-05T04:19:37Z)
Understanding Gradient Clipping in Private SGD: A Geometric Perspective [68.61254575987013]
Deep learning models are increasingly popular in many machine learning applications where the training data may contain sensitive information. Many learning systems now incorporate differential privacy by training their models with (differentially) private SGD. A key step in each private SGD update is gradient clipping that shrinks the gradient of an individual example whenever its L2 norm exceeds some threshold.
arXiv Detail & Related papers (2020-06-27T19:08:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.