Related papers: Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning

Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning

URL: http://arxiv.org/abs/2406.14322v3
Date: Fri, 16 Aug 2024 15:02:45 GMT
Title: Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning
Authors: Lynn Chua, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Daogao Liu, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang,
Abstract summary: differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit. We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
Score: 62.224804688233
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have emerged as powerful tools for tackling complex tasks across diverse domains, but they also raise privacy concerns when fine-tuned on sensitive data due to potential memorization. While differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit, current evaluations on LLMs mostly treat each example (text record) as the privacy unit. This leads to uneven user privacy guarantees when contributions per user vary. We therefore study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users. We present a systematic evaluation of user-level DP for LLM fine-tuning on natural language generation tasks. Focusing on two mechanisms for achieving user-level DP guarantees, Group Privacy and User-wise DP-SGD, we investigate design choices like data selection strategies and parameter tuning for the best privacy-utility tradeoff.

Related papers

VoxPrivacy: A Benchmark for Evaluating Interactional Privacy of Speech Language Models [25.266028200777317]
Speech Language Models (SLMs) are expected to distinguish between users to manage information flow appropriately.<n>Current SLM benchmarks test dialogue ability but overlook speaker identity.<n>We introduce VoxPrivacy, the first benchmark designed to evaluate interactional privacy in SLMs.
arXiv Detail & Related papers (2026-01-27T06:22:14Z)
Machine Learning with Privacy for Protected Attributes [56.44253915927481]
We refine the definition of differential privacy (DP) to create a more general and flexible framework that we call feature differential privacy (FDP)<n>Our definition is simulation-based and allows for both addition/removal and replacement variants of privacy, and can handle arbitrary separation of protected and non-protected features.<n>We apply our framework to various machine learning tasks and show that it can significantly improve the utility of DP-trained models when public features are available.
arXiv Detail & Related papers (2025-06-24T17:53:28Z)
Automated Privacy Information Annotation in Large Language Model Interactions [40.87806981624453]
Users interacting with large language models (LLMs) under their real identifiers often unknowingly risk disclosing private information.<n>Existing privacy detection methods were designed for different objectives and application scenarios.<n>We construct a large-scale multilingual dataset with 249K user queries and 154K annotated privacy phrases.
arXiv Detail & Related papers (2025-05-27T09:00:12Z)
Masked Differential Privacy [64.32494202656801]
We propose an effective approach called masked differential privacy (DP), which allows for controlling sensitive regions where differential privacy is applied. Our method operates selectively on data and allows for defining non-sensitive-temporal regions without DP application or combining differential privacy with other privacy techniques within data samples.
arXiv Detail & Related papers (2024-10-22T15:22:53Z)
Personalized Differential Privacy for Ridge Regression [3.4751583941317166]
We introduce our novel Personalized-DP Output Perturbation method ( PDP-OP) that enables to train Ridge regression models with individual per data point privacy levels. We provide rigorous privacy proofs for our PDP-OP as well as accuracy guarantees for the resulting model. We show that PDP-OP outperforms the personalized privacy techniques of Jorgensen et al.
arXiv Detail & Related papers (2024-01-30T16:00:14Z)
DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer [57.04801796205638]
Large Language Models (LLMs) have emerged as dominant tools for various tasks. However, concerns surrounding data privacy present obstacles due to the tuned prompts' dependency on sensitive private information. We present Differentially-Private Offsite Prompt Tuning (DP-OPT) to address this challenge.
arXiv Detail & Related papers (2023-11-27T02:01:10Z)
Privacy Preserving Large Language Models: ChatGPT Case Study Based Vision and Framework [6.828884629694705]
This article proposes the conceptual model called PrivChatGPT, a privacy-generative model for LLMs. PrivChatGPT consists of two main components i.e., preserving user privacy during the data curation/pre-processing together with preserving private context and the private training process for large-scale data.
arXiv Detail & Related papers (2023-10-19T06:55:13Z)
PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind) Our work offers a theoretical analysis for model design and benchmarks various techniques. In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z)
Personalized DP-SGD using Sampling Mechanisms [5.50042037663784]
We extend Differentially Private Gradient Descent (DP-SGD) to support a recent privacy notion called ($Phi$,$Delta$)- Personalized Differential Privacy (($Phi$,$Delta$)- PDP. Our algorithm uses a multi-round personalized sampling mechanism and embeds it within the DP-SGD iteration. Experiments on real datasets show that our algorithm outperforms DP-SGD and simple combinations of DP-SGD with existing PDP mechanisms.
arXiv Detail & Related papers (2023-05-24T13:56:57Z)
Have it your way: Individualized Privacy Assignment for DP-SGD [33.758209383275926]
We argue that setting a uniform privacy budget across all points may be overly conservative for some users or not sufficiently protective for others. We capture these preferences through individualized privacy budgets. We find it empirically improves privacy-utility trade-offs.
arXiv Detail & Related papers (2023-03-29T22:18:47Z)
How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss. We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z)
Just Fine-tune Twice: Selective Differential Privacy for Large Language Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models. Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z)
Production of Categorical Data Verifying Differential Privacy: Conception and Applications to Machine Learning [0.0]
Differential privacy is a formal definition that allows quantifying the privacy-utility trade-off. With the local DP (LDP) model, users can sanitize their data locally before transmitting it to the server. In all cases, we concluded that differentially private ML models achieve nearly the same utility metrics as non-private ones.
arXiv Detail & Related papers (2022-04-02T12:50:14Z)
Private Reinforcement Learning with PAC and Regret Guarantees [69.4202374491817]
We design privacy preserving exploration policies for episodic reinforcement learning (RL) We first provide a meaningful privacy formulation using the notion of joint differential privacy (JDP) We then develop a private optimism-based learning algorithm that simultaneously achieves strong PAC and regret bounds, and enjoys a JDP guarantee.
arXiv Detail & Related papers (2020-09-18T20:18:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.