DP-TabICL: In-Context Learning with Differentially Private Tabular Data
- URL: http://arxiv.org/abs/2403.05681v1
- Date: Fri, 8 Mar 2024 21:19:01 GMT
- Title: DP-TabICL: In-Context Learning with Differentially Private Tabular Data
- Authors: Alycia N. Carey, Karuna Bhaila, Kennedy Edemacu, Xintao Wu
- Abstract summary: In-context learning (ICL) enables large language models (LLMs) to adapt to new tasks.
LLMs can leak information contained in prompts.
This work serves as an initial investigation into how to use differential privacy (DP)
We formulate two private ICL frameworks with provable privacy guarantees in both the local (LDP-TabICL) and global (GDP-TabICL)
- Score: 12.814878223075437
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In-context learning (ICL) enables large language models (LLMs) to adapt to
new tasks by conditioning on demonstrations of question-answer pairs and it has
been shown to have comparable performance to costly model retraining and
fine-tuning. Recently, ICL has been extended to allow tabular data to be used
as demonstration examples by serializing individual records into natural
language formats. However, it has been shown that LLMs can leak information
contained in prompts, and since tabular data often contain sensitive
information, understanding how to protect the underlying tabular data used in
ICL is a critical area of research. This work serves as an initial
investigation into how to use differential privacy (DP) -- the long-established
gold standard for data privacy and anonymization -- to protect tabular data
used in ICL. Specifically, we investigate the application of DP mechanisms for
private tabular ICL via data privatization prior to serialization and
prompting. We formulate two private ICL frameworks with provable privacy
guarantees in both the local (LDP-TabICL) and global (GDP-TabICL) DP scenarios
via injecting noise into individual records or group statistics, respectively.
We evaluate our DP-based frameworks on eight real-world tabular datasets and
across multiple ICL and DP settings. Our evaluations show that DP-based ICL can
protect the privacy of the underlying tabular data while achieving comparable
performance to non-LLM baselines, especially under high privacy regimes.
Related papers
- DP-2Stage: Adapting Language Models as Differentially Private Tabular Data Generators [47.86275136491794]
We propose a two-stage fine-tuning framework for differentially private data generation.
The first stage involves non-private fine-tuning on a pseudo dataset, followed by DP fine-tuning on a private dataset.
Our results show that this approach improves performance across various settings and metrics compared to directly fine-tuned LLMs in DP contexts.
arXiv Detail & Related papers (2024-12-03T14:10:09Z) - HARMONIC: Harnessing LLMs for Tabular Data Synthesis and Privacy Protection [44.225151701532454]
In this paper, we introduce a new framework HARMONIC for tabular data generation and evaluation.
Our framework achieves equivalent performance to existing methods with better privacy, which also demonstrates our evaluation framework for the effectiveness of synthetic data and privacy risks.
arXiv Detail & Related papers (2024-08-06T03:21:13Z) - Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy.
Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models.
This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z) - Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit.
We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z) - Privacy Amplification for the Gaussian Mechanism via Bounded Support [64.86780616066575]
Data-dependent privacy accounting frameworks such as per-instance differential privacy (pDP) and Fisher information loss (FIL) confer fine-grained privacy guarantees for individuals in a fixed training dataset.
We propose simple modifications of the Gaussian mechanism with bounded support, showing that they amplify privacy guarantees under data-dependent accounting.
arXiv Detail & Related papers (2024-03-07T21:22:07Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Privacy-Preserving In-Context Learning with Differentially Private
Few-Shot Generation [37.55812121348268]
In-context learning (ICL) with large language models (LLMs) on private datasets poses privacy risks.
We propose a novel algorithm that generates synthetic few-shot demonstrations from the private dataset with formal differential privacy guarantees.
arXiv Detail & Related papers (2023-09-21T03:59:00Z) - Probing the Transition to Dataset-Level Privacy in ML Models Using an
Output-Specific and Data-Resolved Privacy Profile [23.05994842923702]
We study a privacy metric that quantifies the extent to which a model trained on a dataset using a Differential Privacy mechanism is covered" by each of the distributions resulting from training on neighboring datasets.
We show that the privacy profile can be used to probe an observed transition to indistinguishability that takes place in the neighboring distributions as $epsilon$ decreases.
arXiv Detail & Related papers (2023-06-27T20:39:07Z) - Privacy-Preserving In-Context Learning for Large Language Models [36.13851291571231]
In-context learning (ICL) is an important capability of Large Language Models (LLMs)
LLMs's responses may leak the sensitive private information contained in in-context exemplars.
We propose Differentially Private In-context Learning (DP-ICL), a general paradigm for privatizing ICL tasks.
arXiv Detail & Related papers (2023-05-02T17:52:58Z) - How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss.
We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.