Membership Inference Attacks Against In-Context Learning
- URL: http://arxiv.org/abs/2409.01380v1
- Date: Mon, 2 Sep 2024 17:23:23 GMT
- Title: Membership Inference Attacks Against In-Context Learning
- Authors: Rui Wen, Zheng Li, Michael Backes, Yang Zhang,
- Abstract summary: We present the first membership inference attack tailored for In-Context Learning (ICL)
We propose four attack strategies tailored to various constrained scenarios.
We investigate three potential defenses targeting data, instruction, and output.
- Score: 26.57639819629732
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adapting Large Language Models (LLMs) to specific tasks introduces concerns about computational efficiency, prompting an exploration of efficient methods such as In-Context Learning (ICL). However, the vulnerability of ICL to privacy attacks under realistic assumptions remains largely unexplored. In this work, we present the first membership inference attack tailored for ICL, relying solely on generated texts without their associated probabilities. We propose four attack strategies tailored to various constrained scenarios and conduct extensive experiments on four popular large language models. Empirical results show that our attacks can accurately determine membership status in most cases, e.g., 95\% accuracy advantage against LLaMA, indicating that the associated risks are much higher than those shown by existing probability-based attacks. Additionally, we propose a hybrid attack that synthesizes the strengths of the aforementioned strategies, achieving an accuracy advantage of over 95\% in most cases. Furthermore, we investigate three potential defenses targeting data, instruction, and output. Results demonstrate combining defenses from orthogonal dimensions significantly reduces privacy leakage and offers enhanced privacy assurances.
Related papers
- Counterfactual Explainable Incremental Prompt Attack Analysis on Large Language Models [32.03992137755351]
This study sheds light on the imperative need to bolster safety and privacy measures in large language models (LLMs)
We propose Counterfactual Explainable Incremental Prompt Attack (CEIPA), a novel technique where we guide prompts in a specific manner to quantitatively measure attack effectiveness.
arXiv Detail & Related papers (2024-07-12T14:26:14Z) - Unveiling the Unseen: Exploring Whitebox Membership Inference through the Lens of Explainability [10.632831321114502]
We propose an attack-driven explainable framework to identify the most influential features of raw data that lead to successful membership inference attacks.
Our proposed MIA shows an improvement of up to 26% on state-of-the-art MIA.
arXiv Detail & Related papers (2024-07-01T14:07:46Z) - Evaluating and Safeguarding the Adversarial Robustness of Retrieval-Based In-Context Learning [21.018893978967053]
In-Context Learning (ICL) is sensitive to the choice, order, and verbaliser used to encode the demonstrations in the prompt.
Retrieval-Augmented ICL methods try to address this problem by leveraging retrievers to extract semantically related examples as demonstrations.
Our study reveals that retrieval-augmented models can enhance robustness against test sample attacks.
We introduce an effective training-free adversarial defence method, DARD, which enriches the example pool with those attacked samples.
arXiv Detail & Related papers (2024-05-24T23:56:36Z) - Goal-guided Generative Prompt Injection Attack on Large Language Models [6.175969971471705]
Large language models (LLMs) provide a strong foundation for large-scale user-oriented natural language tasks.
A large number of users can easily inject adversarial text or instructions through the user interface.
It is unclear how these strategies relate to the success rate of attacks and thus effectively improve model security.
arXiv Detail & Related papers (2024-04-06T06:17:10Z) - Data Poisoning for In-context Learning [49.77204165250528]
In-context learning (ICL) has been recognized for its innovative ability to adapt to new tasks.
This paper delves into the critical issue of ICL's susceptibility to data poisoning attacks.
We introduce ICLPoison, a specialized attacking framework conceived to exploit the learning mechanisms of ICL.
arXiv Detail & Related papers (2024-02-03T14:20:20Z) - Visual Adversarial Examples Jailbreak Aligned Large Language Models [66.53468356460365]
We show that the continuous and high-dimensional nature of the visual input makes it a weak link against adversarial attacks.
We exploit visual adversarial examples to circumvent the safety guardrail of aligned LLMs with integrated vision.
Our study underscores the escalating adversarial risks associated with the pursuit of multimodality.
arXiv Detail & Related papers (2023-06-22T22:13:03Z) - On Evaluating Adversarial Robustness of Large Vision-Language Models [64.66104342002882]
We evaluate the robustness of large vision-language models (VLMs) in the most realistic and high-risk setting.
In particular, we first craft targeted adversarial examples against pretrained models such as CLIP and BLIP.
Black-box queries on these VLMs can further improve the effectiveness of targeted evasion.
arXiv Detail & Related papers (2023-05-26T13:49:44Z) - Is Vertical Logistic Regression Privacy-Preserving? A Comprehensive
Privacy Analysis and Beyond [57.10914865054868]
We consider vertical logistic regression (VLR) trained with mini-batch descent gradient.
We provide a comprehensive and rigorous privacy analysis of VLR in a class of open-source Federated Learning frameworks.
arXiv Detail & Related papers (2022-07-19T05:47:30Z) - Learning and Certification under Instance-targeted Poisoning [49.55596073963654]
We study PAC learnability and certification under instance-targeted poisoning attacks.
We show that when the budget of the adversary scales sublinearly with the sample complexity, PAC learnability and certification are achievable.
We empirically study the robustness of K nearest neighbour, logistic regression, multi-layer perceptron, and convolutional neural network on real data sets.
arXiv Detail & Related papers (2021-05-18T17:48:15Z) - Curse or Redemption? How Data Heterogeneity Affects the Robustness of
Federated Learning [51.15273664903583]
Data heterogeneity has been identified as one of the key features in federated learning but often overlooked in the lens of robustness to adversarial attacks.
This paper focuses on characterizing and understanding its impact on backdooring attacks in federated learning through comprehensive experiments using synthetic and the LEAF benchmarks.
arXiv Detail & Related papers (2021-02-01T06:06:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.