Differentially Private Sharpness-Aware Training
- URL: http://arxiv.org/abs/2306.05651v1
- Date: Fri, 9 Jun 2023 03:37:27 GMT
- Title: Differentially Private Sharpness-Aware Training
- Authors: Jinseong Park, Hoki Kim, Yujin Choi, Jaewook Lee
- Abstract summary: Training deep learning models with differential privacy (DP) results in a degradation of performance.
We show that flat minima can help reduce the negative effects of per-example gradient clipping.
We propose a new sharpness-aware training method that mitigates the privacy-optimization trade-off.
- Score: 5.488902352630076
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Training deep learning models with differential privacy (DP) results in a
degradation of performance. The training dynamics of models with DP show a
significant difference from standard training, whereas understanding the
geometric properties of private learning remains largely unexplored. In this
paper, we investigate sharpness, a key factor in achieving better
generalization, in private learning. We show that flat minima can help reduce
the negative effects of per-example gradient clipping and the addition of
Gaussian noise. We then verify the effectiveness of Sharpness-Aware
Minimization (SAM) for seeking flat minima in private learning. However, we
also discover that SAM is detrimental to the privacy budget and computational
time due to its two-step optimization. Thus, we propose a new sharpness-aware
training method that mitigates the privacy-optimization trade-off. Our
experimental results demonstrate that the proposed method improves the
performance of deep learning models with DP from both scratch and fine-tuning.
Code is available at https://github.com/jinseongP/DPSAT.
Related papers
- DiSK: Differentially Private Optimizer with Simplified Kalman Filter for Noise Reduction [57.83978915843095]
This paper introduces DiSK, a novel framework designed to significantly enhance the performance of differentially private gradients.
To ensure practicality for large-scale training, we simplify the Kalman filtering process, minimizing its memory and computational demands.
arXiv Detail & Related papers (2024-10-04T19:30:39Z) - DP$^2$-FedSAM: Enhancing Differentially Private Federated Learning Through Personalized Sharpness-Aware Minimization [8.022417295372492]
Federated learning (FL) is a distributed machine learning approach that allows multiple clients to collaboratively train a model without sharing their raw data.
To prevent sensitive information from being inferred through the model updates shared in FL, differentially private federated learning (DPFL) has been proposed.
DPFL ensures formal and rigorous privacy protection in FL by clipping and adding random noise to the shared model updates.
We propose DP$2$-FedSAM: Differentially Private and Personalized Federated Learning with Sharpness-Aware Minimization.
arXiv Detail & Related papers (2024-09-20T16:49:01Z) - Mitigating Noise Detriment in Differentially Private Federated Learning with Model Pre-training [27.1846697092374]
Pre-training exploits public datasets to pre-train an advanced machine learning model.
We are the first to explore how model pre-training can mitigate noise detriment in differentially private federated learning.
arXiv Detail & Related papers (2024-08-18T13:48:10Z) - Towards the Flatter Landscape and Better Generalization in Federated
Learning under Client-level Differential Privacy [67.33715954653098]
We propose a novel DPFL algorithm named DP-FedSAM, which leverages gradient perturbation to mitigate the negative impact of DP.
Specifically, DP-FedSAM integrates Sharpness Aware of Minimization (SAM) to generate local flatness models with stability and weight robustness.
To further reduce the magnitude random noise while achieving better performance, we propose DP-FedSAM-$top_k$ by adopting the local update sparsification technique.
arXiv Detail & Related papers (2023-05-01T15:19:09Z) - Enforcing Privacy in Distributed Learning with Performance Guarantees [57.14673504239551]
We study the privatization of distributed learning and optimization strategies.
We show that the popular additive random perturbation scheme degrades performance because it is not well-tuned to the graph structure.
arXiv Detail & Related papers (2023-01-16T13:03:27Z) - Sharpness-Aware Training for Free [163.1248341911413]
SharpnessAware Minimization (SAM) has shown that minimizing a sharpness measure, which reflects the geometry of the loss landscape, can significantly reduce the generalization error.
Sharpness-Aware Training Free (SAF) mitigates the sharp landscape at almost zero computational cost over the base.
SAF ensures the convergence to a flat minimum with improved capabilities.
arXiv Detail & Related papers (2022-05-27T16:32:43Z) - Large Language Models Can Be Strong Differentially Private Learners [70.0317718115406]
Differentially Private (DP) learning has seen limited success for building large deep learning models of text.
We show that this performance drop can be mitigated with the use of large pretrained models.
We propose a memory saving technique that allows clipping in DP-SGD to run without instantiating per-example gradients.
arXiv Detail & Related papers (2021-10-12T01:45:27Z) - DPlis: Boosting Utility of Differentially Private Deep Learning via
Randomized Smoothing [0.0]
We propose DPlis--Differentially Private Learning wIth Smoothing.
We show that DPlis can effectively boost model quality and training stability under a given privacy budget.
arXiv Detail & Related papers (2021-03-02T06:33:14Z) - Sharpness-Aware Minimization for Efficiently Improving Generalization [36.87818971067698]
We introduce a novel, effective procedure for simultaneously minimizing loss value and loss sharpness.
Sharpness-Aware Minimization (SAM) seeks parameters that lie in neighborhoods having uniformly low loss.
We present empirical results showing that SAM improves model generalization across a variety of benchmark datasets.
arXiv Detail & Related papers (2020-10-03T19:02:10Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.