Scaling Laws for Differentially Private Language Models
- URL: http://arxiv.org/abs/2501.18914v1
- Date: Fri, 31 Jan 2025 06:32:46 GMT
- Title: Scaling Laws for Differentially Private Language Models
- Authors: Ryan McKenna, Yangsibo Huang, Amer Sinha, Borja Balle, Zachary Charles, Christopher A. Choquette-Choo, Badih Ghazi, George Kaissis, Ravi Kumar, Ruibo Liu, Da Yu, Chiyuan Zhang,
- Abstract summary: Scaling laws have emerged as important components of large language model (LLM) training as they can predict performance gains through scale.
LLMs rely on large, high-quality training datasets, like those sourced from (sometimes sensitive) user data.
Training models on this sensitive user data requires careful privacy protections like differential privacy (DP)
- Score: 53.14592585413073
- License:
- Abstract: Scaling laws have emerged as important components of large language model (LLM) training as they can predict performance gains through scale, and provide guidance on important hyper-parameter choices that would otherwise be expensive. LLMs also rely on large, high-quality training datasets, like those sourced from (sometimes sensitive) user data. Training models on this sensitive user data requires careful privacy protections like differential privacy (DP). However, the dynamics of DP training are significantly different, and consequently their scaling laws are not yet fully understood. In this work, we establish scaling laws that accurately model the intricacies of DP LLM training, providing a complete picture of the compute-privacy-utility tradeoffs and the optimal training configurations in many settings.
Related papers
- LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws [21.053622641336744]
Loss-to-loss scaling laws relate losses across pretraining datasets and downstream tasks.
Our experiments reveal that the pretraining data and tokenizer determine the scaling trend.
arXiv Detail & Related papers (2025-02-17T18:45:25Z) - The interplay between domain specialization and model size: a case study in the legal domain [8.653321928148547]
We investigate the interplay between domain and model size during continual pre-training under compute-constrained scenarios.
Our goal is to identify a compute-efficient training regime for this scenario.
As model size increases, the compute-effectiveness gap between specialized and general models widens.
arXiv Detail & Related papers (2025-01-03T19:28:53Z) - Sloth: scaling laws for LLM skills to predict multi-benchmark performance across families [43.36524246307057]
Scaling laws for large language models (LLMs) predict performance based on parameters like size and training data.
We propose Skills Scaling Laws (SSLaws), a novel scaling law that leverages publicly available benchmark data.
We present both theoretical results on parameter identification and empirical evaluations on 12 prominent benchmarks.
arXiv Detail & Related papers (2024-12-09T14:51:26Z) - Scaling Law for Language Models Training Considering Batch Size [17.09348741898811]
Large language models (LLMs) have made remarkable advances in recent years, with scaling laws playing a critical role in this rapid progress.
We empirically investigate how a critical hyper- parameter, i.e., the global batch size, influences the LLM training prdocess.
We establish a basic scaling law on model size and training data amount.
We then examine how varying batch sizes and learning rates affect the convergence and generalization of these models.
arXiv Detail & Related papers (2024-12-02T13:58:35Z) - Optimization Hyper-parameter Laws for Large Language Models [52.49860340549727]
We present Opt-Laws, a framework that captures the relationship between hyper- parameters and training outcomes.
Our validation across diverse model sizes and data scales demonstrates Opt-Laws' ability to accurately predict training loss.
This approach significantly reduces computational costs while enhancing overall model performance.
arXiv Detail & Related papers (2024-09-07T09:37:19Z) - Lifelong Personalized Low-Rank Adaptation of Large Language Models for Recommendation [50.837277466987345]
We focus on the field of large language models (LLMs) for recommendation.
We propose RecLoRA, which incorporates a Personalized LoRA module that maintains independent LoRAs for different users.
We also design a Few2Many Learning Strategy, using a conventional recommendation model as a lens to magnify small training spaces to full spaces.
arXiv Detail & Related papers (2024-08-07T04:20:28Z) - DPZero: Private Fine-Tuning of Language Models without Backpropagation [49.365749361283704]
We introduce DPZero, a novel private zeroth-order algorithm with nearly dimension-independent rates.
The memory efficiency of DPZero is demonstrated in privately fine-tuning RoBERTa and OPT on several downstream tasks.
arXiv Detail & Related papers (2023-10-14T18:42:56Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - Large Language Models Can Be Strong Differentially Private Learners [70.0317718115406]
Differentially Private (DP) learning has seen limited success for building large deep learning models of text.
We show that this performance drop can be mitigated with the use of large pretrained models.
We propose a memory saving technique that allows clipping in DP-SGD to run without instantiating per-example gradients.
arXiv Detail & Related papers (2021-10-12T01:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.