Related papers: PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding

PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding

URL: http://arxiv.org/abs/2407.02943v1
Date: Wed, 3 Jul 2024 09:20:04 GMT
Title: PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding
Authors: Krishna Kanth Nakka, Ahmed Frikha, Ricardo Mendes, Xue Jiang, Xuebing Zhou,
Abstract summary: We show that it is possible to improve the extractability of personal identifiable information (PII) by over ten-fold by grounding the manually constructed extraction prompt with in-domain data. Our approach achieves PII phone number extraction rates of 0.92%, 3.9%, and 6.86% with 1, 128, 128, and 2308 queries, respectively, i.e., the phone number of 15 person in 15 is extractable.
Score: 8.98944128441731
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The latest and most impactful advances in large models stem from their increased size. Unfortunately, this translates into an improved memorization capacity, raising data privacy concerns. Specifically, it has been shown that models can output personal identifiable information (PII) contained in their training data. However, reported PIII extraction performance varies widely, and there is no consensus on the optimal methodology to evaluate this risk, resulting in underestimating realistic adversaries. In this work, we empirically demonstrate that it is possible to improve the extractability of PII by over ten-fold by grounding the prefix of the manually constructed extraction prompt with in-domain data. Our approach, PII-Compass, achieves phone number extraction rates of 0.92%, 3.9%, and 6.86% with 1, 128, and 2308 queries, respectively, i.e., the phone number of 1 person in 15 is extractable.

Related papers

Sharpe Ratio-Guided Active Learning for Preference Optimization in RLHF [67.48004037550064]
We propose an active learning approach to efficiently select prompt and preference pairs. Our method evaluates the gradients of all potential preference annotations to assess their impact on model updates. Experimental results demonstrate that our method outperforms the baseline by up to 5% in win rates against the chosen completion.
arXiv Detail & Related papers (2025-03-28T04:22:53Z)
Preference Curriculum: LLMs Should Always Be Pretrained on Their Preferred Data [19.221998577357713]
Large language models (LLMs) generally utilize a consistent data distribution throughout the pretraining process. As the model's capability improves, it is intuitive that its data preferences dynamically change, indicating the need for pretraining with different data at various training stages. We propose the Perplexity Difference (PD) based Preference Curriculum learning framework, which always perceives and uses the data preferred by LLMs to train and boost them.
arXiv Detail & Related papers (2025-01-21T13:12:13Z)
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs [54.05511925104712]
We propose a simple, effective, and data-efficient method called Step-DPO. Step-DPO treats individual reasoning steps as units for preference optimization rather than evaluating answers holistically. Our findings demonstrate that as few as 10K preference data pairs and fewer than 500 Step-DPO training steps can yield a nearly 3% gain in accuracy on MATH for models with over 70B parameters.
arXiv Detail & Related papers (2024-06-26T17:43:06Z)
Improving Entity Recognition Using Ensembles of Deep Learning and Fine-tuned Large Language Models: A Case Study on Adverse Event Extraction from Multiple Sources [13.750202656564907]
Adverse event (AE) extraction is crucial for monitoring and analyzing the safety profiles of immunizations. This study aims to evaluate the effectiveness of large language models (LLMs) and traditional deep learning models in AE extraction.
arXiv Detail & Related papers (2024-06-26T03:56:21Z)
MAmmoTH2: Scaling Instructions from the Web [39.786198452175505]
We propose a paradigm to efficiently harvest 10 million naturally existing instruction data from the pre-training web corpus. We build MAmmoTH2 models, which significantly boost performance on reasoning benchmarks. Further training MAmmoTH2 on public instruction tuning datasets yields MAmmoTH2-Plus, achieving state-of-the-art performance.
arXiv Detail & Related papers (2024-05-06T15:11:38Z)
Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs [61.04246774006429]
We introduce a black-box prompt optimization method that uses an attacker LLM agent to uncover higher levels of memorization in a victim agent. We observe that our instruction-based prompts generate outputs with 23.7% higher overlap with training data compared to the baseline prefix-suffix measurements. Our findings show that instruction-tuned models can expose pre-training data as much as their base-models, if not more so, and using instructions proposed by other LLMs can open a new avenue of automated attacks.
arXiv Detail & Related papers (2024-03-05T19:32:01Z)
ProPILE: Probing Privacy Leakage in Large Language Models [38.92840523665835]
Large language models (LLMs) are often trained on vast quantities of web-collected data, which may inadvertently include sensitive personal data. This paper presents ProPILE, a novel probing tool designed to empower data subjects, or the owners of the PII, with awareness of potential PII leakage.
arXiv Detail & Related papers (2023-07-04T18:53:47Z)
Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning [14.228909822681373]
Large Language Models (LLMs) are known to memorize significant portions of their training data. We present a novel approach which uses prompt-tuning to control the extraction rates of memorized content in LLMs.
arXiv Detail & Related papers (2023-05-19T15:45:29Z)
Boosting Visual-Language Models by Exploiting Hard Samples [126.35125029639168]
HELIP is a cost-effective strategy tailored to enhance the performance of existing CLIP models. Our method allows for effortless integration with existing models' training pipelines. On comprehensive benchmarks, HELIP consistently boosts existing models to achieve leading performance.
arXiv Detail & Related papers (2023-05-09T07:00:17Z)
A Meta-Learning Approach to Predicting Performance and Data Requirements [163.4412093478316]
We propose an approach to estimate the number of samples required for a model to reach a target performance. We find that the power law, the de facto principle to estimate model performance, leads to large error when using a small dataset. We introduce a novel piecewise power law (PPL) that handles the two data differently.
arXiv Detail & Related papers (2023-03-02T21:48:22Z)
Selective In-Context Data Augmentation for Intent Detection using Pointwise V-Information [100.03188187735624]
We introduce a novel approach based on PLMs and pointwise V-information (PVI), a metric that can measure the usefulness of a datapoint for training a model. Our method first fine-tunes a PLM on a small seed of training data and then synthesizes new datapoints - utterances that correspond to given intents. Our method is thus able to leverage the expressive power of large language models to produce diverse training data.
arXiv Detail & Related papers (2023-02-10T07:37:49Z)
Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data [100.33096338195723]
We focus on Few-shot Learning with Auxiliary Data (FLAD) FLAD assumes access to auxiliary data during few-shot learning in hopes of improving generalization. We propose two algorithms -- EXP3-FLAD and UCB1-FLAD -- and compare them with prior FLAD methods that either explore or exploit.
arXiv Detail & Related papers (2023-02-01T18:59:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.