Privacy Auditing of Large Language Models
- URL: http://arxiv.org/abs/2503.06808v1
- Date: Sun, 09 Mar 2025 23:32:15 GMT
- Title: Privacy Auditing of Large Language Models
- Authors: Ashwinee Panda, Xinyu Tang, Milad Nasr, Christopher A. Choquette-Choo, Prateek Mittal,
- Abstract summary: We develop canaries that are far more effective than those used in prior work under threat models.<n>For measuring the memorization rate of non-privately trained LLMs, our designed canaries surpass prior approaches.
- Score: 39.36184297797284
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Current techniques for privacy auditing of large language models (LLMs) have limited efficacy -- they rely on basic approaches to generate canaries which leads to weak membership inference attacks that in turn give loose lower bounds on the empirical privacy leakage. We develop canaries that are far more effective than those used in prior work under threat models that cover a range of realistic settings. We demonstrate through extensive experiments on multiple families of fine-tuned LLMs that our approach sets a new standard for detection of privacy leakage. For measuring the memorization rate of non-privately trained LLMs, our designed canaries surpass prior approaches. For example, on the Qwen2.5-0.5B model, our designed canaries achieve $49.6\%$ TPR at $1\%$ FPR, vastly surpassing the prior approach's $4.2\%$ TPR at $1\%$ FPR. Our method can be used to provide a privacy audit of $\varepsilon \approx 1$ for a model trained with theoretical $\varepsilon$ of 4. To the best of our knowledge, this is the first time that a privacy audit of LLM training has achieved nontrivial auditing success in the setting where the attacker cannot train shadow models, insert gradient canaries, or access the model at every iteration.
Related papers
- Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning [0.9861588522527782]
We present RLDP, the first framework to cast DP optimization itself as a closed-loop control problem amenable to modern deep reinforcement learning (RL)<n>Across more than 1,600 experiments on GPT2-small, Llama-1B, Llama-3B, and Mistral-7B, RLDP delivers perplexity reductions of 1.3-3.0.5% and an average 5.6% downstream utility gain.
arXiv Detail & Related papers (2025-07-30T10:46:53Z) - Optimizing Canaries for Privacy Auditing with Metagradient Descent [32.69637681449977]
We study black-box privacy auditing, where the goal is to lower bound the privacy parameter of a differentially private learning algorithm.<n>Our main contribution is a method for optimizing the auditor's canary set to improve privacy auditing.<n>Our empirical evaluation demonstrates that by using such optimized canaries, we can improve empirical lower bounds for differentially private image classification models by over 2x in certain instances.
arXiv Detail & Related papers (2025-07-21T17:47:33Z) - Adversarial Sample-Based Approach for Tighter Privacy Auditing in Final Model-Only Scenarios [5.116399056871577]
We introduce a novel auditing method that achieves tighter empirical lower bounds without additional assumptions.<n>Our approach surpasses traditional canary-based adversarials and is effective in final model-only scenarios.
arXiv Detail & Related papers (2024-12-02T17:52:16Z) - Nearly Tight Black-Box Auditing of Differentially Private Machine Learning [10.305660258428993]
This paper presents an auditing procedure for the Differentially Private Gradient Descent (DP-SGD) algorithm in the black-box threat model.
The main intuition is to craft worst-case initial model parameters, as DP-SGD's privacy analysis is agnostic to the choice of the initial model parameters.
arXiv Detail & Related papers (2024-05-23T02:24:52Z) - Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack.
When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model.
Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - Private Fine-tuning of Large Language Models with Zeroth-order Optimization [51.19403058739522]
Differentially private gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner.<n>We introduce DP-ZO, a private fine-tuning framework for large language models by privatizing zeroth order optimization methods.
arXiv Detail & Related papers (2024-01-09T03:53:59Z) - Epsilon*: Privacy Metric for Machine Learning Models [7.461284823977013]
Epsilon* is a new metric for measuring the privacy risk of a single model instance prior to, during, or after deployment of privacy mitigation strategies.
It requires only black-box access to model predictions, does not require training data re-sampling or model re-training, and can be used to measure the privacy risk of models not trained with differential privacy.
arXiv Detail & Related papers (2023-07-21T00:49:07Z) - Gaussian Membership Inference Privacy [22.745970468274173]
We propose a novel and practical privacy notion called $f$-Membership Inference Privacy ($f$-MIP)
We derive a family of $f$-MIP guarantees that we refer to as $mu$-Gaussian Membership Inference Privacy ($mu$-GMIP) by theoretically analyzing likelihood ratio-based membership inference attacks on gradient descent (SGD)
arXiv Detail & Related papers (2023-06-12T17:57:05Z) - Tight Auditing of Differentially Private Machine Learning [77.38590306275877]
For private machine learning, existing auditing mechanisms are tight.
They only give tight estimates under implausible worst-case assumptions.
We design an improved auditing scheme that yields tight privacy estimates for natural (not adversarially crafted) datasets.
arXiv Detail & Related papers (2023-02-15T21:40:33Z) - Privately Fine-Tuning Large Language Models with Differential Privacy [10.485556506301549]
Pre-trained Large Language Models (LLMs) are an integral part of modern AI that have led to breakthrough performances in complex AI tasks.
Differential privacy (DP) provides a rigorous framework that allows adding noise in the process of training or fine-tuning LLMs.
We present ewtune, a DP framework for fine-tuning LLMs based on Edgeworth accountant with finite-sample privacy guarantees.
arXiv Detail & Related papers (2022-10-26T21:18:31Z) - CANIFE: Crafting Canaries for Empirical Privacy Measurement in Federated
Learning [77.27443885999404]
Federated Learning (FL) is a setting for training machine learning models in distributed environments.
We propose a novel method, CANIFE, that uses carefully crafted samples by a strong adversary to evaluate the empirical privacy of a training round.
arXiv Detail & Related papers (2022-10-06T13:30:16Z) - Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD.
We find that most examples enjoy stronger privacy guarantees than the worst-case bound.
This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z) - Just Fine-tune Twice: Selective Differential Privacy for Large Language
Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models.
Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.