Related papers: Differentially Private Next-Token Prediction of Large Language Models

Differentially Private Next-Token Prediction of Large Language Models

URL: http://arxiv.org/abs/2403.15638v3
Date: Fri, 26 Apr 2024 20:24:11 GMT
Title: Differentially Private Next-Token Prediction of Large Language Models
Authors: James Flemings, Meisam Razaviyayn, Murali Annavaram,
Abstract summary: DP-SGD, which trains a model to guarantee Differential Privacy, overestimates an adversary's capabilities in having white box access to the model. We present PMixED: a private prediction protocol for next-token prediction that utilizes the inherentity of next-token sampling and a public model to achieve Differential Privacy. Our results show that PMixED achieves a stronger privacy guarantee than sample-level privacy and outperforms DP-SGD for privacy $epsilon = 8$ on large-scale datasets.
Score: 13.297381972044558
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Ensuring the privacy of Large Language Models (LLMs) is becoming increasingly important. The most widely adopted technique to accomplish this is DP-SGD, which trains a model to guarantee Differential Privacy (DP). However, DP-SGD overestimates an adversary's capabilities in having white box access to the model and, as a result, causes longer training times and larger memory usage than SGD. On the other hand, commercial LLM deployments are predominantly cloud-based; hence, adversarial access to LLMs is black-box. Motivated by these observations, we present Private Mixing of Ensemble Distributions (PMixED): a private prediction protocol for next-token prediction that utilizes the inherent stochasticity of next-token sampling and a public model to achieve Differential Privacy. We formalize this by introducing RD-mollifers which project each of the model's output distribution from an ensemble of fine-tuned LLMs onto a set around a public LLM's output distribution, then average the projected distributions and sample from it. Unlike DP-SGD which needs to consider the model architecture during training, PMixED is model agnostic, which makes PMixED a very appealing solution for current deployments. Our results show that PMixED achieves a stronger privacy guarantee than sample-level privacy and outperforms DP-SGD for privacy $\epsilon = 8$ on large-scale datasets. Thus, PMixED offers a practical alternative to DP training methods for achieving strong generative utility without compromising privacy.

Related papers

RAPID: Retrieval Augmented Training of Differentially Private Diffusion Models [26.66607257183987]
We present RAPID: Retrieval Augmented PrIvate Diffusion model. It is a novel approach that integrates retrieval augmented generation into DPDM training. It significantly outperforms state-of-the-art approaches by large margins in generative quality, memory footprint, and inference cost.
arXiv Detail & Related papers (2025-02-18T11:56:51Z)
Adaptively Private Next-Token Prediction of Large Language Models [13.297381972044558]
We introduce a noisy screening mechanism that filters out queries with potentially expensive privacy loss. AdaPMixED can reduce the privacy loss by 16x while preserving the utility over the original PMixED.
arXiv Detail & Related papers (2024-10-02T20:34:24Z)
DP$^2$-FedSAM: Enhancing Differentially Private Federated Learning Through Personalized Sharpness-Aware Minimization [8.022417295372492]
Federated learning (FL) is a distributed machine learning approach that allows multiple clients to collaboratively train a model without sharing their raw data. To prevent sensitive information from being inferred through the model updates shared in FL, differentially private federated learning (DPFL) has been proposed. DPFL ensures formal and rigorous privacy protection in FL by clipping and adding random noise to the shared model updates. We propose DP$2$-FedSAM: Differentially Private and Personalized Federated Learning with Sharpness-Aware Minimization.
arXiv Detail & Related papers (2024-09-20T16:49:01Z)
LLM-based Privacy Data Augmentation Guided by Knowledge Distillation with a Distribution Tutor for Medical Text Classification [67.92145284679623]
We propose a DP-based tutor that models the noised private distribution and controls samples' generation with a low privacy cost. We theoretically analyze our model's privacy protection and empirically verify our model.
arXiv Detail & Related papers (2024-02-26T11:52:55Z)
Private Fine-tuning of Large Language Models with Zeroth-order Optimization [51.19403058739522]
Differentially private gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner. We introduce DP-ZO, a private fine-tuning framework for large language models by privatizing zeroth order optimization methods.
arXiv Detail & Related papers (2024-01-09T03:53:59Z)
Arbitrary Decisions are a Hidden Cost of Differentially Private Training [7.560688419767116]
Mechanisms used in machine learning often aim to guarantee differential privacy (DP) during model training. Practical DP-ensuring training methods use randomization when fitting model parameters to privacy-sensitive data. For a given input example, the output predicted by equally-private models depends on the randomness used in training.
arXiv Detail & Related papers (2023-02-28T12:13:43Z)
Multi-Message Shuffled Privacy in Federated Learning [2.6778110563115542]
We study differentially private distributed optimization under communication constraints. A server using SGD for optimization aggregates the client-side local gradients for model updates using distributed mean estimation (DME) We develop a communication-efficient private DME, using the recently developed multi-message shuffled (MMS) privacy framework.
arXiv Detail & Related papers (2023-02-22T05:23:52Z)
Large Scale Transfer Learning for Differentially Private Image Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy. Private training using DP-SGD protects against leakage by injecting noise into individual example gradients. While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z)
Don't Generate Me: Training Differentially Private Generative Models with Sinkhorn Divergence [73.14373832423156]
We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy. Unlike existing approaches for training differentially private generative models, we do not rely on adversarial objectives.
arXiv Detail & Related papers (2021-11-01T18:10:21Z)
Large Language Models Can Be Strong Differentially Private Learners [70.0317718115406]
Differentially Private (DP) learning has seen limited success for building large deep learning models of text. We show that this performance drop can be mitigated with the use of large pretrained models. We propose a memory saving technique that allows clipping in DP-SGD to run without instantiating per-example gradients.
arXiv Detail & Related papers (2021-10-12T01:45:27Z)
Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users. An adversary may still be able to infer the private training data by attacking the released model. Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.