Sparsity-Preserving Differentially Private Training of Large Embedding
Models
- URL: http://arxiv.org/abs/2311.08357v1
- Date: Tue, 14 Nov 2023 17:59:51 GMT
- Title: Sparsity-Preserving Differentially Private Training of Large Embedding
Models
- Authors: Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Pasin
Manurangsi, Amer Sinha, Chiyuan Zhang
- Abstract summary: DP-SGD is a training algorithm that combines differential privacy with gradient descent.
Applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency.
We present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during private training of large embedding models.
- Score: 67.29926605156788
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the use of large embedding models in recommendation systems and language
applications increases, concerns over user data privacy have also risen.
DP-SGD, a training algorithm that combines differential privacy with stochastic
gradient descent, has been the workhorse in protecting user privacy without
compromising model accuracy by much. However, applying DP-SGD naively to
embedding models can destroy gradient sparsity, leading to reduced training
efficiency. To address this issue, we present two new algorithms, DP-FEST and
DP-AdaFEST, that preserve gradient sparsity during private training of large
embedding models. Our algorithms achieve substantial reductions ($10^6 \times$)
in gradient size, while maintaining comparable levels of accuracy, on benchmark
real-world datasets.
Related papers
- Enhancing DP-SGD through Non-monotonous Adaptive Scaling Gradient Weight [15.139854970044075]
We introduce Differentially Private Per-sample Adaptive Scaling Clipping (DP-PSASC)
This approach replaces traditional clipping with non-monotonous adaptive gradient scaling.
Our theoretical and empirical analyses confirm that DP-PSASC preserves gradient privacy and delivers superior performance across diverse datasets.
arXiv Detail & Related papers (2024-11-05T12:47:30Z) - Differential Privacy Regularization: Protecting Training Data Through Loss Function Regularization [49.1574468325115]
Training machine learning models based on neural networks requires large datasets, which may contain sensitive information.
Differentially private SGD [DP-SGD] requires the modification of the standard gradient descent [SGD] algorithm for training new models.
A novel regularization strategy is proposed to achieve the same goal in a more efficient manner.
arXiv Detail & Related papers (2024-09-25T17:59:32Z) - Private Fine-tuning of Large Language Models with Zeroth-order Optimization [51.19403058739522]
Differentially private gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner.
We introduce DP-ZO, a private fine-tuning framework for large language models by privatizing zeroth order optimization methods.
arXiv Detail & Related papers (2024-01-09T03:53:59Z) - Equivariant Differentially Private Deep Learning: Why DP-SGD Needs
Sparser Models [7.49320945341034]
We show that small and efficient architecture design can outperform current state-of-the-art models with substantially lower computational requirements.
Our results are a step towards efficient model architectures that make optimal use of their parameters.
arXiv Detail & Related papers (2023-01-30T17:43:47Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - Large Language Models Can Be Strong Differentially Private Learners [70.0317718115406]
Differentially Private (DP) learning has seen limited success for building large deep learning models of text.
We show that this performance drop can be mitigated with the use of large pretrained models.
We propose a memory saving technique that allows clipping in DP-SGD to run without instantiating per-example gradients.
arXiv Detail & Related papers (2021-10-12T01:45:27Z) - Do Not Let Privacy Overbill Utility: Gradient Embedding Perturbation for
Private Learning [74.73901662374921]
A differentially private model degrades the utility drastically when the model comprises a large number of trainable parameters.
We propose an algorithm emphGradient Embedding Perturbation (GEP) towards training differentially private deep models with decent accuracy.
arXiv Detail & Related papers (2021-02-25T04:29:58Z) - Improving Deep Learning with Differential Privacy using Gradient
Encoding and Denoising [36.935465903971014]
In this paper, we aim at training deep learning models with differential privacy guarantees.
Our key technique is to encode gradients to map them to a smaller vector space.
We show that our mechanism outperforms the state-of-the-art DPSGD.
arXiv Detail & Related papers (2020-07-22T16:33:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.