Model Inversion Attacks on Llama 3: Extracting PII from Large Language Models
- URL: http://arxiv.org/abs/2507.04478v1
- Date: Sun, 06 Jul 2025 17:24:17 GMT
- Title: Model Inversion Attacks on Llama 3: Extracting PII from Large Language Models
- Authors: Sathesh P. Sivashanmugam,
- Abstract summary: Large language models (LLMs) have transformed natural language processing, but their ability to memorize training data poses significant privacy risks.<n>This paper investigates model inversion attacks on the Llama 3.2 model, a multilingual LLM developed by Meta.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have transformed natural language processing, but their ability to memorize training data poses significant privacy risks. This paper investigates model inversion attacks on the Llama 3.2 model, a multilingual LLM developed by Meta. By querying the model with carefully crafted prompts, we demonstrate the extraction of personally identifiable information (PII) such as passwords, email addresses, and account numbers. Our findings highlight the vulnerability of even smaller LLMs to privacy attacks and underscore the need for robust defenses. We discuss potential mitigation strategies, including differential privacy and data sanitization, and call for further research into privacy-preserving machine learning techniques.
Related papers
- A Survey on Model Extraction Attacks and Defenses for Large Language Models [55.60375624503877]
Model extraction attacks pose significant security threats to deployed language models.<n>This survey provides a comprehensive taxonomy of extraction attacks and defenses, categorizing attacks into functionality extraction, training data extraction, and prompt-targeted attacks.<n>We examine defense mechanisms organized into model protection, data privacy protection, and prompt-targeted strategies, evaluating their effectiveness across different deployment scenarios.
arXiv Detail & Related papers (2025-06-26T22:02:01Z) - Private Memorization Editing: Turning Memorization into a Defense to Strengthen Data Privacy in Large Language Models [1.2874523233023452]
We introduce Private Memorization Editing (PME), an approach for preventing private data leakage.<n>We detect a memorized PII and then mitigate the memorization of PII by editing a model knowledge of its training data.<n>PME can effectively reduce the number of leaked PII in a number of configurations, in some cases even reducing the accuracy of the privacy attacks to zero.
arXiv Detail & Related papers (2025-06-09T17:57:43Z) - Tokens for Learning, Tokens for Unlearning: Mitigating Membership Inference Attacks in Large Language Models via Dual-Purpose Training [13.680205342714412]
Large language models (LLMs) have become the backbone of modern natural language processing but pose privacy concerns about leaking sensitive training data.<n>We propose methodname, a lightweight yet effective empirical privacy defense for protecting training data of language models by leveraging token-specific characteristics.
arXiv Detail & Related papers (2025-02-27T03:37:45Z) - Evaluating LLM-based Personal Information Extraction and Countermeasures [63.91918057570824]
Large language model (LLM) based personal information extraction can be benchmarked.<n>LLM can be misused by attackers to accurately extract various personal information from personal profiles.<n> prompt injection can defend against strong LLM-based attacks, reducing the attack to less effective traditional ones.
arXiv Detail & Related papers (2024-08-14T04:49:30Z) - Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Anonymizing text that contains sensitive information is crucial for a wide range of applications.<n>Existing techniques face the emerging challenges of the re-identification ability of large language models.<n>We propose a framework composed of three key components: a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z) - Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack.
When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model.
Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - Beyond Gradient and Priors in Privacy Attacks: Leveraging Pooler Layer Inputs of Language Models in Federated Learning [24.059033969435973]
This paper presents a two-stage privacy attack strategy that targets the vulnerabilities in the architecture of contemporary language models.
Our comparative experiments demonstrate superior attack performance across various datasets and scenarios.
We call for the community to recognize and address these potential privacy risks in designing large language models.
arXiv Detail & Related papers (2023-12-10T01:19:59Z) - Assessing Privacy Risks in Language Models: A Case Study on
Summarization Tasks [65.21536453075275]
We focus on the summarization task and investigate the membership inference (MI) attack.
We exploit text similarity and the model's resistance to document modifications as potential MI signals.
We discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.
arXiv Detail & Related papers (2023-10-20T05:44:39Z) - Privacy in Large Language Models: Attacks, Defenses and Future Directions [84.73301039987128]
We analyze the current privacy attacks targeting large language models (LLMs) and categorize them according to the adversary's assumed capabilities.
We present a detailed overview of prominent defense strategies that have been developed to counter these privacy attacks.
arXiv Detail & Related papers (2023-10-16T13:23:54Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Quantifying and Analyzing Entity-level Memorization in Large Language
Models [4.59914731734176]
Large language models (LLMs) have been proven capable of memorizing their training data.
Privacy risks arising from memorization have attracted increasing attention.
We propose a fine-grained, entity-level definition to quantify memorization with conditions and metrics closer to real-world scenarios.
arXiv Detail & Related papers (2023-08-30T03:06:47Z) - Just Fine-tune Twice: Selective Differential Privacy for Large Language
Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models.
Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.