Noisy Neighbors: Efficient membership inference attacks against LLMs
- URL: http://arxiv.org/abs/2406.16565v1
- Date: Mon, 24 Jun 2024 12:02:20 GMT
- Title: Noisy Neighbors: Efficient membership inference attacks against LLMs
- Authors: Filippo Galli, Luca Melis, Tommaso Cucinotta,
- Abstract summary: This paper introduces an efficient methodology that generates textitnoisy neighbors for a target sample by adding noise in the embedding space.
Our findings demonstrate that this approach closely matches the effectiveness of employing shadow models, showing its usability in practical privacy auditing scenarios.
- Score: 2.666596421430287
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The potential of transformer-based LLMs risks being hindered by privacy concerns due to their reliance on extensive datasets, possibly including sensitive information. Regulatory measures like GDPR and CCPA call for using robust auditing tools to address potential privacy issues, with Membership Inference Attacks (MIA) being the primary method for assessing LLMs' privacy risks. Differently from traditional MIA approaches, often requiring computationally intensive training of additional models, this paper introduces an efficient methodology that generates \textit{noisy neighbors} for a target sample by adding stochastic noise in the embedding space, requiring operating the target model in inference mode only. Our findings demonstrate that this approach closely matches the effectiveness of employing shadow models, showing its usability in practical privacy auditing scenarios.
Related papers
- Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy.
Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models.
This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z) - Exposing Privacy Gaps: Membership Inference Attack on Preference Data for LLM Alignment [8.028743532294532]
We introduce a novel reference-based attack framework specifically for analyzing preference data called PREMIA.
We provide empirical evidence that DPO models are more vulnerable to MIA compared to PPO models.
arXiv Detail & Related papers (2024-07-08T22:53:23Z) - Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress.
Our investigation exposes a critical oversight in this belief.
By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z) - Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning [32.52811740662061]
This article introduces DP-LoRA, a novel federated learning algorithm tailored for large language models (LLMs)
DP-LoRA preserves data privacy by employing a Gaussian mechanism that adds noise in weight updates, maintaining individual data privacy while facilitating collaborative model training.
arXiv Detail & Related papers (2023-12-29T06:50:38Z) - Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration [32.15773300068426]
Membership Inference Attacks (MIAs) aim to infer whether a target data record has been utilized for model training or not.
We propose a Membership Inference Attack based on Self-calibrated Probabilistic Variation (SPV-MIA)
Specifically, since memorization in LLMs is inevitable during the training process and occurs before overfitting, we introduce a more reliable membership signal.
arXiv Detail & Related papers (2023-11-10T13:55:05Z) - Assessing Privacy Risks in Language Models: A Case Study on
Summarization Tasks [65.21536453075275]
We focus on the summarization task and investigate the membership inference (MI) attack.
We exploit text similarity and the model's resistance to document modifications as potential MI signals.
We discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.
arXiv Detail & Related papers (2023-10-20T05:44:39Z) - ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks [91.55895047448249]
This paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases.
We implement ReEval using ChatGPT and evaluate the resulting variants of two popular open-domain QA datasets.
Our generated data is human-readable and useful to trigger hallucination in large language models.
arXiv Detail & Related papers (2023-10-19T06:37:32Z) - A Differentially Private Weighted Empirical Risk Minimization Procedure
and its Application to Outcome Weighted Learning [5.025486694392673]
We propose the first differentially private wERM algorithm, backed by a rigorous theoretical proof of its DP guarantees.
We evaluate the performance of the DP-wERM application to weighted learning (OWL) in a simulation study and in a real clinical trial of melatonin for sleep health.
arXiv Detail & Related papers (2023-07-24T21:03:25Z) - MAPS: A Noise-Robust Progressive Learning Approach for Source-Free
Domain Adaptive Keypoint Detection [76.97324120775475]
Cross-domain keypoint detection methods always require accessing the source data during adaptation.
This paper considers source-free domain adaptive keypoint detection, where only the well-trained source model is provided to the target domain.
arXiv Detail & Related papers (2023-02-09T12:06:08Z) - Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients [54.98496284653234]
We consider the task of training a policy that maximizes reward while minimizing disclosure of certain sensitive state variables through the actions.
We solve this problem by introducing a regularizer based on the mutual information between the sensitive state and the actions.
We develop a model-based estimator for optimization of privacy-constrained policies.
arXiv Detail & Related papers (2020-12-30T03:22:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.