Training Large ASR Encoders with Differential Privacy
- URL: http://arxiv.org/abs/2409.13953v1
- Date: Sat, 21 Sep 2024 00:01:49 GMT
- Title: Training Large ASR Encoders with Differential Privacy
- Authors: Geeticka Chauhan, Steve Chien, Om Thakkar, Abhradeep Thakurta, Arun Narayanan,
- Abstract summary: Self-supervised learning (SSL) methods for large speech models have proven to be highly effective at ASR.
With the interest in public deployment of large pre-trained models, there is a rising concern for unintended memorization and leakage of sensitive data points from the training data.
This paper is the first to apply differentially private (DP) pre-training to a SOTA Conformer-based encoder, and study its performance on a downstream ASR task assuming the fine-tuning data is public.
- Score: 18.624449993983106
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Self-supervised learning (SSL) methods for large speech models have proven to be highly effective at ASR. With the interest in public deployment of large pre-trained models, there is a rising concern for unintended memorization and leakage of sensitive data points from the training data. In this paper, we apply differentially private (DP) pre-training to a SOTA Conformer-based encoder, and study its performance on a downstream ASR task assuming the fine-tuning data is public. This paper is the first to apply DP to SSL for ASR, investigating the DP noise tolerance of the BEST-RQ pre-training method. Notably, we introduce a novel variant of model pruning called gradient-based layer freezing that provides strong improvements in privacy-utility-compute trade-offs. Our approach yields a LibriSpeech test-clean/other WER (%) of 3.78/ 8.41 with ($10$, 1e^-9)-DP for extrapolation towards low dataset scales, and 2.81/ 5.89 with (10, 7.9e^-11)-DP for extrapolation towards high scales.
Related papers
- Differentially Private Parameter-Efficient Fine-tuning for Large ASR Models [21.1164927055712]
Large ASR models can inadvertently leak sensitive information, which can be mitigated by formal privacy measures like differential privacy (DP)
Our study explores DP parameter-efficient fine-tuning as a way to mitigate privacy risks with smaller computation and performance costs for ASR models.
arXiv Detail & Related papers (2024-10-02T18:49:15Z) - Pre-training Differentially Private Models with Limited Public Data [54.943023722114134]
differential privacy (DP) is a prominent method to gauge the degree of security provided to the models.
DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training stage.
We develop a novel DP continual pre-training strategy using only 10% of public data.
Our strategy can achieve DP accuracy of 41.5% on ImageNet-21k, as well as non-DP accuracy of 55.7% and and 60.0% on downstream tasks Places365 and iNaturalist-2021.
arXiv Detail & Related papers (2024-02-28T23:26:27Z) - Sparsity-Preserving Differentially Private Training of Large Embedding
Models [67.29926605156788]
DP-SGD is a training algorithm that combines differential privacy with gradient descent.
Applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency.
We present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during private training of large embedding models.
arXiv Detail & Related papers (2023-11-14T17:59:51Z) - Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised
Language Understanding [38.11411155621616]
We study self-training as one of the predominant semi-supervised learning approaches.
We present UPET, a novel Uncertainty-aware self-Training framework.
We show that UPET achieves a substantial improvement in terms of performance and efficiency.
arXiv Detail & Related papers (2023-10-19T02:18:29Z) - An Experimental Study on Private Aggregation of Teacher Ensemble
Learning for End-to-End Speech Recognition [51.232523987916636]
Differential privacy (DP) is one data protection avenue to safeguard user information used for training deep models by imposing noisy distortion on privacy data.
In this work, we extend PATE learning to work with dynamic patterns, namely speech, and perform one very first experimental study on ASR to avoid acoustic data leakage.
arXiv Detail & Related papers (2022-10-11T16:55:54Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - Unlocking High-Accuracy Differentially Private Image Classification
through Scale [45.93988209606857]
Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points.
Previous works have found that DP-SGD often leads to a significant degradation in performance on standard image classification benchmarks.
We demonstrate that DP-SGD on over- parameterized models can perform significantly better than previously thought.
arXiv Detail & Related papers (2022-04-28T17:10:56Z) - Large Language Models Can Be Strong Differentially Private Learners [70.0317718115406]
Differentially Private (DP) learning has seen limited success for building large deep learning models of text.
We show that this performance drop can be mitigated with the use of large pretrained models.
We propose a memory saving technique that allows clipping in DP-SGD to run without instantiating per-example gradients.
arXiv Detail & Related papers (2021-10-12T01:45:27Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.