Assessing and Mitigating Data Memorization Risks in Fine-Tuned Large Language Models
- URL: http://arxiv.org/abs/2508.14062v1
- Date: Sun, 10 Aug 2025 10:26:55 GMT
- Title: Assessing and Mitigating Data Memorization Risks in Fine-Tuned Large Language Models
- Authors: Badrinath Ramakrishnan, Akshaya Balaji,
- Abstract summary: Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse natural language processing tasks.<n>Their tendency to memorize training data poses significant privacy risks, particularly during fine-tuning processes.<n>This paper presents a comprehensive empirical analysis of data in fine-tuned LLMs and introduces a novel multi-layered privacy protection framework.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse natural language processing tasks, but their tendency to memorize training data poses significant privacy risks, particularly during fine-tuning processes. This paper presents a comprehensive empirical analysis of data memorization in fine-tuned LLMs and introduces a novel multi-layered privacy protection framework. Through controlled experiments on modern LLM architectures including GPT-2, Phi-3, and Gemma-2, we demonstrate that fine-tuning with repeated sensitive data increases privacy leakage rates from baseline levels of 0-5% to 60-75%, representing a 64.2% average increase across tested models. We propose and rigorously evaluate four complementary privacy protection methods: semantic data deduplication, differential privacy during generation, entropy-based filtering, and pattern-based content filtering. Our experimental results show that these techniques can reduce data leakage to 0% while maintaining 94.7% of original model utility.
Related papers
- Private PoEtry: Private In-Context Learning via Product of Experts [58.496468062236225]
In-context learning (ICL) enables Large Language Models to adapt to new tasks with only a small set of examples at inference time.<n>Existing differential privacy approaches to ICL are either computationally expensive or rely on oversampling, synthetic data generation, or unnecessary thresholding.<n>We reformulate private ICL through the lens of a Product-of-Experts model. This gives a theoretically grounded framework, and the algorithm can be trivially parallelized.<n>We find that our method improves accuracy by more than 30 percentage points on average compared to prior DP-ICL methods, while maintaining strong privacy guarantees.
arXiv Detail & Related papers (2026-02-04T19:56:24Z) - Unintended Memorization of Sensitive Information in Fine-Tuned Language Models [24.228889351240838]
Fine-tuning Large Language Models (LLMs) on sensitive datasets carries a substantial risk of unintended memorization and leakage of Personally Identifiable Information (PII)<n>We design controlled extraction probes to quantify unintended PII memorization and study how factors such as language, PII frequency, task type, and model size influence memorization behavior.
arXiv Detail & Related papers (2026-01-24T15:08:45Z) - Randomized Masked Finetuning: An Efficient Way to Mitigate Memorization of PIIs in LLMs [2.9506547907696006]
We introduce Randomized Masked Fine-Tuning (RMFT), a privacy-preserving fine-tuning technique that reduces memorization while minimizing performance impact.<n>We demonstrate that RMFT achieves an 80.81% reduction in Total Extraction Rate and 80.17% reduction in Seen Extraction Rate compared to baseline fine-tuning.
arXiv Detail & Related papers (2025-12-02T23:46:42Z) - On the MIA Vulnerability Gap Between Private GANs and Diffusion Models [51.53790101362898]
Generative Adversarial Networks (GANs) and diffusion models have emerged as leading approaches for high-quality image synthesis.<n>We present the first unified theoretical and empirical analysis of the privacy risks faced by differentially private generative models.
arXiv Detail & Related papers (2025-09-03T14:18:22Z) - LLM4MEA: Data-free Model Extraction Attacks on Sequential Recommenders via Large Language Models [50.794651919028965]
Recent studies have demonstrated the vulnerability of sequential recommender systems to Model Extraction Attacks (MEAs)<n>Black-box attacks in prior MEAs are ineffective at exposing recommender system vulnerabilities due to random sampling in data selection.<n>We propose LLM4MEA, a novel model extraction method that leverages Large Language Models (LLMs) as human-like rankers to generate data.
arXiv Detail & Related papers (2025-07-22T19:20:23Z) - SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks [17.77094760401298]
We study the vulnerability of fine-tuned large language models to membership inference attacks (MIAs)<n>We propose SOFT, a novel defense technique that mitigates privacy leakage by leveraging influential data selection with an adjustable parameter to balance utility preservation and privacy protection.
arXiv Detail & Related papers (2025-06-12T07:23:56Z) - Self-Refining Language Model Anonymizers via Adversarial Distillation [49.17383264812234]
Large language models (LLMs) are increasingly used in sensitive domains, where their ability to infer personal data poses emerging privacy risks.<n>We introduce SElf-refining Anonymization with Language model (SEAL), a novel distillation framework for training small language models (SLMs) to perform effective anonymization.
arXiv Detail & Related papers (2025-06-02T08:21:27Z) - STOP! Benchmarking Large Language Models with Sensitivity Testing on Offensive Progressions [6.19084217044276]
Mitigating explicit and implicit biases in Large Language Models (LLMs) has become a critical focus in the field of natural language processing.<n>We introduce the Sensitivity Testing on Offensive Progressions dataset, which includes 450 offensive progressions containing 2,700 unique sentences.<n>Our findings reveal that even the best-performing models detect bias inconsistently, with success rates ranging from 19.3% to 69.8%.
arXiv Detail & Related papers (2024-09-20T18:34:38Z) - Generated Data with Fake Privacy: Hidden Dangers of Fine-tuning Large Language Models on Generated Data [18.984529269623135]
This study investigates whether fine-tuning with generated data truly enhances privacy or introduces additional privacy risks.<n>We use the Pythia Model Suite and Open Pre-trained Transformer to measure privacy risks.
arXiv Detail & Related papers (2024-09-12T10:14:12Z) - Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Anonymizing text that contains sensitive information is crucial for a wide range of applications.<n>Existing techniques face the emerging challenges of the re-identification ability of large language models.<n>We propose a framework composed of three key components: a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z) - FewFedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning [54.26614091429253]
Federated instruction tuning (FedIT) is a promising solution, by consolidating collaborative training across multiple data owners.
FedIT encounters limitations such as scarcity of instructional data and risk of exposure to training data extraction attacks.
We propose FewFedPIT, designed to simultaneously enhance privacy protection and model performance of federated few-shot learning.
arXiv Detail & Related papers (2024-03-10T08:41:22Z) - Locally Differentially Private Document Generation Using Zero Shot
Prompting [61.20953109732442]
We propose a locally differentially private mechanism called DP-Prompt to counter author de-anonymization attacks.
When DP-Prompt is used with a powerful language model like ChatGPT (gpt-3.5), we observe a notable reduction in the success rate of de-anonymization attacks.
arXiv Detail & Related papers (2023-10-24T18:25:13Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.