Related papers: PII-Scope: A Benchmark for Training Data PII Leakage Assessment in LLMs

PII-Scope: A Benchmark for Training Data PII Leakage Assessment in LLMs

URL: http://arxiv.org/abs/2410.06704v1
Date: Wed, 9 Oct 2024 09:16:25 GMT
Title: PII-Scope: A Benchmark for Training Data PII Leakage Assessment in LLMs
Authors: Krishna Kanth Nakka, Ahmed Frikha, Ricardo Mendes, Xue Jiang, Xuebing Zhou,
Abstract summary: We introduce PII-Scope, a comprehensive benchmark designed to evaluate state-of-the-art methodologies for PII extraction attacks targeting LLMs. We extend our study to more realistic attack scenarios, exploring PII attacks that employ advanced adversarial strategies. We show that with sophisticated adversarial capabilities and a limited query budget, PII extraction rates can increase by up to fivefold when targeting the pretrained model.
Score: 8.98944128441731
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we introduce PII-Scope, a comprehensive benchmark designed to evaluate state-of-the-art methodologies for PII extraction attacks targeting LLMs across diverse threat settings. Our study provides a deeper understanding of these attacks by uncovering several hyperparameters (e.g., demonstration selection) crucial to their effectiveness. Building on this understanding, we extend our study to more realistic attack scenarios, exploring PII attacks that employ advanced adversarial strategies, including repeated and diverse querying, and leveraging iterative learning for continual PII extraction. Through extensive experimentation, our results reveal a notable underestimation of PII leakage in existing single-query attacks. In fact, we show that with sophisticated adversarial capabilities and a limited query budget, PII extraction rates can increase by up to fivefold when targeting the pretrained model. Moreover, we evaluate PII leakage on finetuned models, showing that they are more vulnerable to leakage than pretrained models. Overall, our work establishes a rigorous empirical benchmark for PII extraction attacks in realistic threat scenarios and provides a strong foundation for developing effective mitigation strategies.

Related papers

A Survey on Model Extraction Attacks and Defenses for Large Language Models [55.60375624503877]
Model extraction attacks pose significant security threats to deployed language models.<n>This survey provides a comprehensive taxonomy of extraction attacks and defenses, categorizing attacks into functionality extraction, training data extraction, and prompt-targeted attacks.<n>We examine defense mechanisms organized into model protection, data privacy protection, and prompt-targeted strategies, evaluating their effectiveness across different deployment scenarios.
arXiv Detail & Related papers (2025-06-26T22:02:01Z)
Evaluating Query Efficiency and Accuracy of Transfer Learning-based Model Extraction Attack in Federated Learning [4.275908952997288]
Federated Learning (FL) is a collaborative learning framework designed to protect client data.<n>Despite FL's privacy-preserving goals, its distributed nature makes it particularly susceptible to model extraction attacks.<n>This paper examines the vulnerability of FL-based victim models to two types of model extraction attacks.
arXiv Detail & Related papers (2025-05-25T22:40:10Z)
Equally Critical: Samples, Targets, and Their Mappings in Datasets [6.859656302020063]
This paper investigates how sample and target collectively influence training dynamic.<n>We first establish a taxonomy of existing paradigms through the lens of sample-target interactions.<n>We then propose a novel unified loss framework to assess their impact on training efficiency.
arXiv Detail & Related papers (2025-05-17T08:27:19Z)
FEDLAD: Federated Evaluation of Deep Leakage Attacks and Defenses [50.921333548391345]
Federated Learning is a privacy preserving decentralized machine learning paradigm. Recent research has revealed that private ground truth data can be recovered through a gradient technique known as Deep Leakage. This paper introduces the FEDLAD Framework (Federated Evaluation of Deep Leakage Attacks and Defenses), a comprehensive benchmark for evaluating Deep Leakage attacks and defenses.
arXiv Detail & Related papers (2024-11-05T11:42:26Z)
DV-FSR: A Dual-View Target Attack Framework for Federated Sequential Recommendation [4.980393474423609]
Federated recommendation (FedRec) preserves user privacy by enabling decentralized training of personalized models, but this architecture is inherently vulnerable to adversarial attacks. We propose a novel dualview attack framework, named DV-FSR, which combines a sampling-based explicit strategy with a contrastive learning-based implicit gradient strategy to orchestrate a coordinated attack.
arXiv Detail & Related papers (2024-09-10T15:24:13Z)
Membership Inference Attacks Against In-Context Learning [26.57639819629732]
We present the first membership inference attack tailored for In-Context Learning (ICL) We propose four attack strategies tailored to various constrained scenarios. We investigate three potential defenses targeting data, instruction, and output.
arXiv Detail & Related papers (2024-09-02T17:23:23Z)
Preference-Based Multi-Agent Reinforcement Learning: Data Coverage and Algorithmic Techniques [65.55451717632317]
We study Preference-Based Multi-Agent Reinforcement Learning (PbMARL) We identify the Nash equilibrium from a preference-only offline dataset in general-sum games. Our findings underscore the multifaceted approach required for PbMARL.
arXiv Detail & Related papers (2024-09-01T13:14:41Z)
Detecting and Understanding Vulnerabilities in Language Models via Mechanistic Interpretability [44.99833362998488]
Large Language Models (LLMs) have shown impressive performance across a wide range of tasks. LLMs in particular are known to be vulnerable to adversarial attacks, where an imperceptible change to the input can mislead the output of the model. We propose a method, based on Mechanistic Interpretability (MI) techniques, to guide this process.
arXiv Detail & Related papers (2024-07-29T09:55:34Z)
Efficient Adversarial Training in LLMs with Continuous Attacks [99.5882845458567]
Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails. We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses. C-AdvIPO is an adversarial variant of IPO that does not require utility data for adversarially robust alignment.
arXiv Detail & Related papers (2024-05-24T14:20:09Z)
Data Poisoning for In-context Learning [49.77204165250528]
In-context learning (ICL) has been recognized for its innovative ability to adapt to new tasks. This paper delves into the critical issue of ICL's susceptibility to data poisoning attacks. We introduce ICLPoison, a specialized attacking framework conceived to exploit the learning mechanisms of ICL.
arXiv Detail & Related papers (2024-02-03T14:20:20Z)
Defending Pre-trained Language Models as Few-shot Learners against Backdoor Attacks [72.03945355787776]
We advocate MDP, a lightweight, pluggable, and effective defense for PLMs as few-shot learners. We show analytically that MDP creates an interesting dilemma for the attacker to choose between attack effectiveness and detection evasiveness.
arXiv Detail & Related papers (2023-09-23T04:41:55Z)
The Space of Adversarial Strategies [6.295859509997257]
Adversarial examples, inputs designed to induce worst-case behavior in machine learning models, have been extensively studied over the past decade. We propose a systematic approach to characterize worst-case (i.e., optimal) adversaries.
arXiv Detail & Related papers (2022-09-09T20:53:11Z)
Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency [61.03922379081648]
We propose an off-policy sample efficient approach that requires no adversarial training or min-max optimization. Our empirical results show that D2-Imitation is effective in achieving good sample efficiency, outperforming several off-policy extension approaches of adversarial imitation.
arXiv Detail & Related papers (2021-12-11T19:36:19Z)
Understanding Adversarial Attacks on Observations in Deep Reinforcement Learning [32.12283927682007]
Deep reinforcement learning models are vulnerable to adversarial attacks which can decrease the victim's total reward by manipulating the observations. We reformulate the problem of adversarial attacks in function space and separate the previous gradient based attacks into several subspaces. In the first stage, we train a deceptive policy by hacking the environment, and discover a set of trajectories routing to the lowest reward. Our method provides a tighter theoretical upper bound for the attacked agent's performance than the existing approaches.
arXiv Detail & Related papers (2021-06-30T07:41:51Z)
ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine Learning Models [64.03398193325572]
Inference attacks against Machine Learning (ML) models allow adversaries to learn about training data, model parameters, etc. We concentrate on four attacks - namely, membership inference, model inversion, attribute inference, and model stealing. Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models.
arXiv Detail & Related papers (2021-02-04T11:35:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.