Related papers: DP-TabICL: In-Context Learning with Differentially Private Tabular Data

DP-TabICL: In-Context Learning with Differentially Private Tabular Data

URL: http://arxiv.org/abs/2403.05681v1
Date: Fri, 8 Mar 2024 21:19:01 GMT
Title: DP-TabICL: In-Context Learning with Differentially Private Tabular Data
Authors: Alycia N. Carey, Karuna Bhaila, Kennedy Edemacu, Xintao Wu
Abstract summary: In-context learning (ICL) enables large language models (LLMs) to adapt to new tasks. LLMs can leak information contained in prompts. This work serves as an initial investigation into how to use differential privacy (DP) We formulate two private ICL frameworks with provable privacy guarantees in both the local (LDP-TabICL) and global (GDP-TabICL)
Score: 12.814878223075437
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In-context learning (ICL) enables large language models (LLMs) to adapt to new tasks by conditioning on demonstrations of question-answer pairs and it has been shown to have comparable performance to costly model retraining and fine-tuning. Recently, ICL has been extended to allow tabular data to be used as demonstration examples by serializing individual records into natural language formats. However, it has been shown that LLMs can leak information contained in prompts, and since tabular data often contain sensitive information, understanding how to protect the underlying tabular data used in ICL is a critical area of research. This work serves as an initial investigation into how to use differential privacy (DP) -- the long-established gold standard for data privacy and anonymization -- to protect tabular data used in ICL. Specifically, we investigate the application of DP mechanisms for private tabular ICL via data privatization prior to serialization and prompting. We formulate two private ICL frameworks with provable privacy guarantees in both the local (LDP-TabICL) and global (GDP-TabICL) DP scenarios via injecting noise into individual records or group statistics, respectively. We evaluate our DP-based frameworks on eight real-world tabular datasets and across multiple ICL and DP settings. Our evaluations show that DP-based ICL can protect the privacy of the underlying tabular data while achieving comparable performance to non-LLM baselines, especially under high privacy regimes.

Related papers

Do You Really Need Public Data? Surrogate Public Data for Differential Privacy on Tabular Data [10.1687640711587]
This work introduces the notion of "surrogate" public data, which consume no privacy loss budget and are constructed solely from publicly available schema or metadata. We automate the process of generating surrogate public data with large language models (LLMs) In particular, we propose two methods: direct record generation as CSV files, and automated structural causal model (SCM) construction for sampling records.
arXiv Detail & Related papers (2025-04-19T17:55:10Z)
DP-2Stage: Adapting Language Models as Differentially Private Tabular Data Generators [47.86275136491794]
We propose DP-2Stage, a two-stage fine-tuning framework for differentially private data generation. Our empirical results show that this approach improves performance across various settings and metrics.
arXiv Detail & Related papers (2024-12-03T14:10:09Z)
HARMONIC: Harnessing LLMs for Tabular Data Synthesis and Privacy Protection [44.225151701532454]
In this paper, we introduce a new framework HARMONIC for tabular data generation and evaluation. Our framework achieves equivalent performance to existing methods with better privacy, which also demonstrates our evaluation framework for the effectiveness of synthetic data and privacy risks.
arXiv Detail & Related papers (2024-08-06T03:21:13Z)
Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy. Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models. This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z)
Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit. We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z)
Locally Differentially Private In-Context Learning [8.659575019965152]
Large pretrained language models (LLMs) have shown surprising In-Context Learning (ICL) ability. This paper proposes a locally differentially private framework of in-context learning (LDP-ICL) Considering the mechanisms of in-context learning in Transformers by gradient descent, we provide an analysis of the trade-off between privacy and utility in such LDP-ICL.
arXiv Detail & Related papers (2024-05-07T06:05:43Z)
Privacy Amplification for the Gaussian Mechanism via Bounded Support [64.86780616066575]
Data-dependent privacy accounting frameworks such as per-instance differential privacy (pDP) and Fisher information loss (FIL) confer fine-grained privacy guarantees for individuals in a fixed training dataset. We propose simple modifications of the Gaussian mechanism with bounded support, showing that they amplify privacy guarantees under data-dependent accounting.
arXiv Detail & Related papers (2024-03-07T21:22:07Z)
PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind) Our work offers a theoretical analysis for model design and benchmarks various techniques. In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z)
Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation [37.55812121348268]
In-context learning (ICL) with large language models (LLMs) on private datasets poses privacy risks. We propose a novel algorithm that generates synthetic few-shot demonstrations from the private dataset with formal differential privacy guarantees.
arXiv Detail & Related papers (2023-09-21T03:59:00Z)
Probing the Transition to Dataset-Level Privacy in ML Models Using an Output-Specific and Data-Resolved Privacy Profile [23.05994842923702]
We study a privacy metric that quantifies the extent to which a model trained on a dataset using a Differential Privacy mechanism is covered" by each of the distributions resulting from training on neighboring datasets. We show that the privacy profile can be used to probe an observed transition to indistinguishability that takes place in the neighboring distributions as $epsilon$ decreases.
arXiv Detail & Related papers (2023-06-27T20:39:07Z)
Privacy-Preserving In-Context Learning for Large Language Models [36.13851291571231]
In-context learning (ICL) is an important capability of Large Language Models (LLMs) LLMs's responses may leak the sensitive private information contained in in-context exemplars. We propose Differentially Private In-context Learning (DP-ICL), a general paradigm for privatizing ICL tasks.
arXiv Detail & Related papers (2023-05-02T17:52:58Z)
How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss. We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z)
DP2-Pub: Differentially Private High-Dimensional Data Publication with Invariant Post Randomization [58.155151571362914]
We propose a differentially private high-dimensional data publication mechanism (DP2-Pub) that runs in two phases. splitting attributes into several low-dimensional clusters with high intra-cluster cohesion and low inter-cluster coupling helps obtain a reasonable privacy budget. We also extend our DP2-Pub mechanism to the scenario with a semi-honest server which satisfies local differential privacy.
arXiv Detail & Related papers (2022-08-24T17:52:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.