Related papers: PATCH: Mitigating PII Leakage in Language Models with Privacy-Aware Targeted Circuit PatcHing

PATCH: Mitigating PII Leakage in Language Models with Privacy-Aware Targeted Circuit PatcHing

URL: http://arxiv.org/abs/2510.07452v1
Date: Wed, 08 Oct 2025 18:58:41 GMT
Title: PATCH: Mitigating PII Leakage in Language Models with Privacy-Aware Targeted Circuit PatcHing
Authors: Anthony Hughes, Vasisht Duddu, N. Asokan, Nikolaos Aletras, Ning Ma,
Abstract summary: Language models (LMs) may memorize personally identifiable information (PII) from training data, enabling adversaries to extract it during inference.<n>Existing defense mechanisms such as differential privacy (DP) reduce this leakage, but incur large drops in utility.<n>We propose.<n>Privacy-Aware Targeted Circuit PatcHing, a novel approach that first identifies and then directly edits PII circuits to reduce leakage.
Score: 36.296154937249845
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Language models (LMs) may memorize personally identifiable information (PII) from training data, enabling adversaries to extract it during inference. Existing defense mechanisms such as differential privacy (DP) reduce this leakage, but incur large drops in utility. Based on a comprehensive study using circuit discovery to identify the computational circuits responsible PII leakage in LMs, we hypothesize that specific PII leakage circuits in LMs should be responsible for this behavior. Therefore, we propose PATCH (Privacy-Aware Targeted Circuit PatcHing), a novel approach that first identifies and subsequently directly edits PII circuits to reduce leakage. PATCH achieves better privacy-utility trade-off than existing defenses, e.g., reducing recall of PII leakage from LMs by up to 65%. Finally, PATCH can be combined with DP to reduce recall of residual leakage of an LM to as low as 0.01%. Our analysis shows that PII leakage circuits persist even after the application of existing defense mechanisms. In contrast, PATCH can effectively mitigate their impact.

Related papers

RePCS: Diagnosing Data Memorization in LLM-Powered Retrieval-Augmented Generation [0.0]
Models may still rely on memorized training data, bypass the retrieved evidence, and produce contaminated outputs.<n>We introduce Retrieval-Path Contamination Scoring (RePCS), a diagnostic method that detects such behavior without requiring model access or retraining.
arXiv Detail & Related papers (2025-06-18T14:48:19Z)
Exploiting Inaccurate Branch History in Side-Channel Attacks [54.218160467764086]
This paper examines how resource sharing and contention affect two widely implemented but underdocumented features: Bias-Free Branch Prediction and Branch History Speculation.<n>We show that these features can inadvertently modify the Branch History Buffer (BHB) update behavior and create new primitives that trigger malicious mis-speculations.<n>We present three novel attack primitives: two Spectre attacks, namely Spectre-BSE and Spectre-BHS, and a cross-privilege control flow side-channel attack called BiasScope.
arXiv Detail & Related papers (2025-06-08T19:46:43Z)
PrivacyScalpel: Enhancing LLM Privacy via Interpretable Feature Intervention with Sparse Autoencoders [8.483679748399037]
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing but pose privacy risks by memorizing and leaking Personally Identifiable Information (PII)<n>Existing mitigation strategies, such as differential privacy and neuron-level interventions, often degrade model utility or fail to effectively prevent leakage.<n>We introduce PrivacyScalpel, a novel privacy-preserving framework that leverages interpretability techniques to identify and mitigate PII leakage while maintaining performance.
arXiv Detail & Related papers (2025-03-14T09:31:01Z)
Proactive Privacy Amnesia for Large Language Models: Safeguarding PII with Negligible Impact on Model Utility [39.51362903320998]
We propose a novel approach, Proactive Privacy Amnesia, to safeguard PII in large language models (LLMs)<n>This mechanism works by actively identifying and forgetting key memories most closely associated with PII in sequences, followed by a memory implanting to maintain the LLM's functionality.<n>Results show that our PPA method completely eliminates the risk of phone number exposure by 100% and significantly reduces the risk of physical address exposure by 9.8% - 87.6%.
arXiv Detail & Related papers (2025-02-24T19:16:39Z)
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs [92.7279890407059]
Multi-head Latent Attention (MLA) is an innovative architecture designed to ensure efficient and economical inference.<n>This paper proposes the first data-efficient fine-tuning method for transitioning from Multi-Head Attention to MLA.
arXiv Detail & Related papers (2025-02-20T18:50:42Z)
RandOhm: Mitigating Impedance Side-channel Attacks using Randomized Circuit Configurations [6.388730198692013]
We introduce RandOhm, which exploits a moving target defense (MTD) strategy based on the partial reconfiguration (PR) feature of mainstream FPGAs. We demonstrate that the information leakage through the PDN impedance could be significantly reduced via runtime reconfiguration of the secret-sensitive parts of the circuitry. In contrast to existing PR-based countermeasures, RandOhm deploys open-source bitstream manipulation tools to speed up the randomization and provide real-time protection.
arXiv Detail & Related papers (2024-01-17T02:22:28Z)
Setting the Trap: Capturing and Defeating Backdoors in Pretrained Language Models through Honeypots [68.84056762301329]
Recent research has exposed the susceptibility of pretrained language models (PLMs) to backdoor attacks. We propose and integrate a honeypot module into the original PLM to absorb backdoor information exclusively. Our design is motivated by the observation that lower-layer representations in PLMs carry sufficient backdoor features.
arXiv Detail & Related papers (2023-10-28T08:21:16Z)
Analyzing Leakage of Personally Identifiable Information in Language Models [13.467340359030855]
Language Models (LMs) have been shown to leak information about training data through sentence-level membership inference and reconstruction attacks. Scrubbing techniques reduce but do not prevent the risk of PII leakage. It is unclear to which extent algorithmic defenses such as differential privacy, designed to guarantee user-level privacy, prevent PII disclosure.
arXiv Detail & Related papers (2023-02-01T16:04:48Z)
Is Vertical Logistic Regression Privacy-Preserving? A Comprehensive Privacy Analysis and Beyond [57.10914865054868]
We consider vertical logistic regression (VLR) trained with mini-batch descent gradient. We provide a comprehensive and rigorous privacy analysis of VLR in a class of open-source Federated Learning frameworks.
arXiv Detail & Related papers (2022-07-19T05:47:30Z)
Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning [125.8224674893018]
Offline Reinforcement Learning (RL) aims to learn policies from previously collected datasets without exploring the environment. Applying off-policy algorithms to offline RL usually fails due to the extrapolation error caused by the out-of-distribution (OOD) actions. We propose Pessimistic Bootstrapping for offline RL (PBRL), a purely uncertainty-driven offline algorithm without explicit policy constraints.
arXiv Detail & Related papers (2022-02-23T15:27:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.