Evaluating LLM-based Personal Information Extraction and Countermeasures
- URL: http://arxiv.org/abs/2408.07291v3
- Date: Fri, 31 Jan 2025 05:16:50 GMT
- Title: Evaluating LLM-based Personal Information Extraction and Countermeasures
- Authors: Yupei Liu, Yuqi Jia, Jinyuan Jia, Neil Zhenqiang Gong,
- Abstract summary: Large language model (LLM) based personal information extraction can be benchmarked.
LLM can be misused by attackers to accurately extract various personal information from personal profiles.
prompt injection can defend against strong LLM-based attacks, reducing the attack to less effective traditional ones.
- Score: 63.91918057570824
- License:
- Abstract: Automatically extracting personal information -- such as name, phone number, and email address -- from publicly available profiles at a large scale is a stepstone to many other security attacks including spear phishing. Traditional methods -- such as regular expression, keyword search, and entity detection -- achieve limited success at such personal information extraction. In this work, we perform a systematic measurement study to benchmark large language model (LLM) based personal information extraction and countermeasures. Towards this goal, we present a framework for LLM-based extraction attacks; collect four datasets including a synthetic dataset generated by GPT-4 and three real-world datasets with manually labeled eight categories of personal information; introduce a novel mitigation strategy based on prompt injection; and systematically benchmark LLM-based attacks and countermeasures using ten LLMs and five datasets. Our key findings include: LLM can be misused by attackers to accurately extract various personal information from personal profiles; LLM outperforms traditional methods; and prompt injection can defend against strong LLM-based attacks, reducing the attack to less effective traditional ones.
Related papers
- Model Inversion in Split Learning for Personalized LLMs: New Insights from Information Bottleneck Theory [11.83473842859642]
This work is the first to identify model inversion attacks in the split learning framework for personalized LLMs.
We propose a two-stage attack system in which the first part projects representations into the embedding space, and the second part uses a generative model to recover text from these embeddings.
arXiv Detail & Related papers (2025-01-10T13:47:13Z) - MemHunter: Automated and Verifiable Memorization Detection at Dataset-scale in LLMs [28.593941036010417]
We introduce MemHunter, which trains a memory-inducing LLM and employs hypothesis testing to efficiently detect memorization at the dataset level.
MemHunter is the first method capable of dataset-level memorization detection, providing a critical tool for assessing privacy risks in large-scale datasets.
arXiv Detail & Related papers (2024-12-10T07:42:46Z) - LLM-PBE: Assessing Data Privacy in Large Language Models [111.58198436835036]
Large Language Models (LLMs) have become integral to numerous domains, significantly advancing applications in data management, mining, and analysis.
Despite the critical nature of this issue, there has been no existing literature to offer a comprehensive assessment of data privacy risks in LLMs.
Our paper introduces LLM-PBE, a toolkit crafted specifically for the systematic evaluation of data privacy risks in LLMs.
arXiv Detail & Related papers (2024-08-23T01:37:29Z) - Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy.
Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models.
This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z) - Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress.
Our investigation exposes a critical oversight in this belief.
By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z) - Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs [61.04246774006429]
We introduce a black-box prompt optimization method that uses an attacker LLM agent to uncover higher levels of memorization in a victim agent.
We observe that our instruction-based prompts generate outputs with 23.7% higher overlap with training data compared to the baseline prefix-suffix measurements.
Our findings show that instruction-tuned models can expose pre-training data as much as their base-models, if not more so, and using instructions proposed by other LLMs can open a new avenue of automated attacks.
arXiv Detail & Related papers (2024-03-05T19:32:01Z) - Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs [59.596335292426105]
This paper collects the first open-source dataset to evaluate safeguards in large language models.
We train several BERT-like classifiers to achieve results comparable with GPT-4 on automatic safety evaluation.
arXiv Detail & Related papers (2023-08-25T14:02:12Z) - On Extracting Specialized Code Abilities from Large Language Models: A
Feasibility Study [22.265542509143756]
We investigate the feasibility of launching imitation attacks on large language models (LLMs)
We show that attackers can train a medium-sized backbone model to replicate specialized code behaviors similar to the target LLMs.
arXiv Detail & Related papers (2023-03-06T10:34:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.