Related papers: Unintended Memorization of Sensitive Information in Fine-Tuned Language Models

Unintended Memorization of Sensitive Information in Fine-Tuned Language Models

URL: http://arxiv.org/abs/2601.17480v1
Date: Sat, 24 Jan 2026 15:08:45 GMT
Title: Unintended Memorization of Sensitive Information in Fine-Tuned Language Models
Authors: Marton Szep, Jorge Marin Ruiz, Georgios Kaissis, Paulina Seidl, Rüdiger von Eisenhart-Rothe, Florian Hinterwimmer, Daniel Rueckert,
Abstract summary: Fine-tuning Large Language Models (LLMs) on sensitive datasets carries a substantial risk of unintended memorization and leakage of Personally Identifiable Information (PII)<n>We design controlled extraction probes to quantify unintended PII memorization and study how factors such as language, PII frequency, task type, and model size influence memorization behavior.
Score: 24.228889351240838
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fine-tuning Large Language Models (LLMs) on sensitive datasets carries a substantial risk of unintended memorization and leakage of Personally Identifiable Information (PII), which can violate privacy regulations and compromise individual safety. In this work, we systematically investigate a critical and underexplored vulnerability: the exposure of PII that appears only in model inputs, not in training targets. Using both synthetic and real-world datasets, we design controlled extraction probes to quantify unintended PII memorization and study how factors such as language, PII frequency, task type, and model size influence memorization behavior. We further benchmark four privacy-preserving approaches including differential privacy, machine unlearning, regularization, and preference alignment, evaluating their trade-offs between privacy and task performance. Our results show that post-training methods generally provide more consistent privacy-utility trade-offs, while differential privacy achieves strong reduction in leakage in specific settings, although it can introduce training instability. These findings highlight the persistent challenge of memorization in fine-tuned LLMs and emphasize the need for robust, scalable privacy-preserving techniques.

Related papers

Privacy Collapse: Benign Fine-Tuning Can Break Contextual Privacy in Language Models [47.866853046761044]
We find that diverse, subtle patterns in training data can degrade contextual privacy.<n>Fine-tuned models lose their ability to reason about contextual privacy norms.<n>Our results reveal a critical gap in current safety evaluations.
arXiv Detail & Related papers (2026-01-21T17:53:06Z)
Towards Benchmarking Privacy Vulnerabilities in Selective Forgetting with Large Language Models [28.389198065125314]
selective forgetting (also known as machine unlearning) has shown promise for privacy and data removal tasks.<n>Despite its promise, selective forgetting raises significant privacy concerns.<n>We present the first comprehensive benchmark for evaluating privacy vulnerabilities in selective forgetting.
arXiv Detail & Related papers (2025-12-19T20:04:06Z)
Forget What's Sensitive, Remember What Matters: Token-Level Differential Privacy in Memory Sculpting for Continual Learning [26.034865955638864]
We propose a privacy-enhanced continual learning framework that forgets what's sensitive and remembers what matters.<n>Our approach first introduces a token-level dynamic Differential Privacy strategy.<n>Second, we integrate a privacy-guided memory sculpting module.
arXiv Detail & Related papers (2025-09-16T11:01:59Z)
On the MIA Vulnerability Gap Between Private GANs and Diffusion Models [51.53790101362898]
Generative Adversarial Networks (GANs) and diffusion models have emerged as leading approaches for high-quality image synthesis.<n>We present the first unified theoretical and empirical analysis of the privacy risks faced by differentially private generative models.
arXiv Detail & Related papers (2025-09-03T14:18:22Z)
Differential Privacy in Machine Learning: From Symbolic AI to LLMs [49.1574468325115]
Differential privacy provides a formal framework to mitigate privacy risks.<n>It ensures that the inclusion or exclusion of any single data point does not significantly alter the output of an algorithm.
arXiv Detail & Related papers (2025-06-13T11:30:35Z)
Adaptive Clipping for Privacy-Preserving Few-Shot Learning: Enhancing Generalization with Limited Data [12.614480013684759]
We introduce a novel approach called Meta-Clip to enhance the utility of privacy-preserving few-shot learning methods.<n>By dynamically adjusting clipping thresholds during the training process, our Adaptive Clipping method provides fine-grained control over the disclosure of sensitive information.<n>We demonstrate the effectiveness of our approach in minimizing utility degradation, showcasing a superior privacy-preserving trade-off compared to existing privacy-preserving techniques.
arXiv Detail & Related papers (2025-03-27T05:14:18Z)
Privacy in Fine-tuning Large Language Models: Attacks, Defenses, and Future Directions [11.338466798715906]
Fine-tuning Large Language Models (LLMs) can achieve state-of-the-art performance across various domains.<n>This paper provides a comprehensive survey of privacy challenges associated with fine-tuning LLMs.<n>We highlight vulnerabilities to various privacy attacks, including membership inference, data extraction, and backdoor attacks.
arXiv Detail & Related papers (2024-12-21T06:41:29Z)
Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit. We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z)
PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind) Our work offers a theoretical analysis for model design and benchmarks various techniques. In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z)
Robustness Threats of Differential Privacy [70.818129585404]
We experimentally demonstrate that networks, trained with differential privacy, in some settings might be even more vulnerable in comparison to non-private versions. We study how the main ingredients of differentially private neural networks training, such as gradient clipping and noise addition, affect the robustness of the model.
arXiv Detail & Related papers (2020-12-14T18:59:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.