Related papers: DPZero: Private Fine-Tuning of Language Models without Backpropagation

DPZero: Private Fine-Tuning of Language Models without Backpropagation

URL: http://arxiv.org/abs/2310.09639v3
Date: Thu, 6 Jun 2024 14:31:03 GMT
Title: DPZero: Private Fine-Tuning of Language Models without Backpropagation
Authors: Liang Zhang, Bingcong Li, Kiran Koshy Thekumparampil, Sewoong Oh, Niao He,
Abstract summary: We introduce DPZero, a novel private zeroth-order algorithm with nearly dimension-independent rates. The memory efficiency of DPZero is demonstrated in privately fine-tuning RoBERTa and OPT on several downstream tasks.
Score: 49.365749361283704
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The widespread practice of fine-tuning large language models (LLMs) on domain-specific data faces two major challenges in memory and privacy. First, as the size of LLMs continues to grow, the memory demands of gradient-based training methods via backpropagation become prohibitively high. Second, given the tendency of LLMs to memorize training data, it is important to protect potentially sensitive information in the fine-tuning data from being regurgitated. Zeroth-order methods, which rely solely on forward passes, substantially reduce memory consumption during training. However, directly combining them with standard differentially private gradient descent suffers more as model size grows. To bridge this gap, we introduce DPZero, a novel private zeroth-order algorithm with nearly dimension-independent rates. The memory efficiency of DPZero is demonstrated in privately fine-tuning RoBERTa and OPT on several downstream tasks. Our code is available at https://github.com/Liang137/DPZero.

Related papers

Scaling Laws for Differentially Private Language Models [53.14592585413073]
Scaling laws have emerged as important components of large language model (LLM) training as they can predict performance gains through scale. LLMs rely on large, high-quality training datasets, like those sourced from (sometimes sensitive) user data. Training models on this sensitive user data requires careful privacy protections like differential privacy (DP)
arXiv Detail & Related papers (2025-01-31T06:32:46Z)
Thinking Forward: Memory-Efficient Federated Finetuning of Language Models [21.438831528354513]
Finetuning large language models (LLMs) in federated learning settings requires excessive memory for resource-constrained devices. In this paper, we introduce Spry, an FL algorithm that splits trainable weights of an LLM among participating clients. Spry achieves a low memory footprint, high accuracy, and fast convergence.
arXiv Detail & Related papers (2024-05-24T13:37:48Z)
Differentially Private Zeroth-Order Methods for Scalable Large Language Model Finetuning [0.0]
differentially private (DP) fine-tuning of pretrained LLMs has been widely used to safeguarding the privacy of task-specific datasets. Despite pushing the scalability of DP-SGD to its limit, DP-SGD-based fine-tuning methods are unfortunately limited by the inherent inefficiency of SGD.
arXiv Detail & Related papers (2024-02-12T17:24:15Z)
Private Fine-tuning of Large Language Models with Zeroth-order Optimization [51.19403058739522]
Differentially private gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner. We introduce DP-ZO, a private fine-tuning framework for large language models by privatizing zeroth order optimization methods.
arXiv Detail & Related papers (2024-01-09T03:53:59Z)
Sparsity-Preserving Differentially Private Training of Large Embedding Models [67.29926605156788]
DP-SGD is a training algorithm that combines differential privacy with gradient descent. Applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency. We present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during private training of large embedding models.
arXiv Detail & Related papers (2023-11-14T17:59:51Z)
Training Large-Vocabulary Neural Language Models by Private Federated Learning for Resource-Constrained Devices [14.604785223644718]
Federated Learning (FL) is a technique to train models using data distributed across devices. Differential Privacy (DP) provides a formal privacy guarantee for sensitive data. We propose Partial Embedding Updates (PEU) to decrease noise by decreasing payload size.
arXiv Detail & Related papers (2022-07-18T23:53:17Z)
Large Scale Transfer Learning for Differentially Private Image Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy. Private training using DP-SGD protects against leakage by injecting noise into individual example gradients. While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z)
Don't Generate Me: Training Differentially Private Generative Models with Sinkhorn Divergence [73.14373832423156]
We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy. Unlike existing approaches for training differentially private generative models, we do not rely on adversarial objectives.
arXiv Detail & Related papers (2021-11-01T18:10:21Z)
Large Language Models Can Be Strong Differentially Private Learners [70.0317718115406]
Differentially Private (DP) learning has seen limited success for building large deep learning models of text. We show that this performance drop can be mitigated with the use of large pretrained models. We propose a memory saving technique that allows clipping in DP-SGD to run without instantiating per-example gradients.
arXiv Detail & Related papers (2021-10-12T01:45:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.