Related papers: Mitigating Membership Inference in Intermediate Representations via Layer-wise MIA-risk-aware DP-SGD

Mitigating Membership Inference in Intermediate Representations via Layer-wise MIA-risk-aware DP-SGD

URL: http://arxiv.org/abs/2602.22611v1
Date: Thu, 26 Feb 2026 04:32:14 GMT
Title: Mitigating Membership Inference in Intermediate Representations via Layer-wise MIA-risk-aware DP-SGD
Authors: Jiayang Meng, Tao Huang, Chen Hou, Guolong Zheng, Hong Chen,
Abstract summary: This paper introduces Layer-wise MIA-risk-aware DP-SGD (LM-DP-SGD), which allocates privacy protection across layers in proportion to their MIA risk.<n>Under the same privacy budget, LM-DP-SGD reduces the peak IR-level MIA risk while preserving utility, yielding a superior privacy-utility trade-off.
Score: 26.493235454865538
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In Embedding-as-an-Interface (EaaI) settings, pre-trained models are queried for Intermediate Representations (IRs). The distributional properties of IRs can leak training-set membership signals, enabling Membership Inference Attacks (MIAs) whose strength varies across layers. Although Differentially Private Stochastic Gradient Descent (DP-SGD) mitigates such leakage, existing implementations employ per-example gradient clipping and a uniform, layer-agnostic noise multiplier, ignoring heterogeneous layer-wise MIA vulnerability. This paper introduces Layer-wise MIA-risk-aware DP-SGD (LM-DP-SGD), which adaptively allocates privacy protection across layers in proportion to their MIA risk. Specifically, LM-DP-SGD trains a shadow model on a public shadow dataset, extracts per-layer IRs from its train/test splits, and fits layer-specific MIA adversaries, using their attack error rates as MIA-risk estimates. Leveraging the cross-dataset transferability of MIAs, these estimates are then used to reweight each layer's contribution to the globally clipped gradient during private training, providing layer-appropriate protection under a fixed noise magnitude. We further establish theoretical guarantees on both privacy and convergence of LM-DP-SGD. Extensive experiments show that, under the same privacy budget, LM-DP-SGD reduces the peak IR-level MIA risk while preserving utility, yielding a superior privacy-utility trade-off.

Related papers

In-Context Probing for Membership Inference in Fine-Tuned Language Models [14.590625376049955]
Membership inference attacks (MIAs) pose a critical privacy threat to fine-tuned large language models (LLMs)<n>We propose ICP-MIA, a novel MIA framework grounded in the theory of training dynamics.<n>ICP-MIA significantly outperforms prior black-box MIAs, particularly at low false positive rates.
arXiv Detail & Related papers (2025-12-18T08:26:26Z)
Ensemble Privacy Defense for Knowledge-Intensive LLMs against Membership Inference Attacks [21.852575873751917]
Inference Attacks pose serious threats to privacy and trust in sensitive domains.<n>We introduce a novel, model-agnostic defense framework, Ensemble Privacy Defense (EPD)<n>EPD reduces MIA success by up to 27.8% for SFT and 526.3% for RAG compared to inference-time baseline.
arXiv Detail & Related papers (2025-12-01T18:12:18Z)
Differential Privacy: Gradient Leakage Attacks in Federated Learning Environments [0.6850683267295249]
Federated Learning (FL) allows for the training of Machine Learning models in a collaborative manner without the need to share sensitive data.<n>It remains vulnerable to Gradient Leakage Attacks (GLAs), which can reveal private information from the shared model updates.<n>We investigate the effectiveness of Differential Privacy mechanisms as defenses against GLAs.
arXiv Detail & Related papers (2025-10-27T23:33:21Z)
DCMI: A Differential Calibration Membership Inference Attack Against Retrieval-Augmented Generation [20.140666137717208]
We propose a differential calibration MIA that mitigates the negative impact of non-member-retrieved documents.<n>Experiments show that DCMI consistently outperforms baselines--for example, achieving 97.42% AUC and 94.35% Accuracy against the RAG system with Flan-T5.<n>These results highlight significant privacy risks in RAG systems and emphasize the need for stronger protection mechanisms.
arXiv Detail & Related papers (2025-09-07T11:58:02Z)
Defending against Indirect Prompt Injection by Instruction Detection [109.30156975159561]
InstructDetector is a novel detection-based approach that leverages the behavioral states of LLMs to identify potential IPI attacks.<n>InstructDetector achieves a detection accuracy of 99.60% in the in-domain setting and 96.90% in the out-of-domain setting, and reduces the attack success rate to just 0.03% on the BIPIA benchmark.
arXiv Detail & Related papers (2025-05-08T13:04:45Z)
Bias-Aware Minimisation: Understanding and Mitigating Estimator Bias in Private SGD [56.01810892677744]
We show a connection between per-sample gradient norms and the estimation bias of the private gradient oracle used in DP-SGD. We propose Bias-Aware Minimisation (BAM) that allows for the provable reduction of private gradient estimator bias.
arXiv Detail & Related papers (2023-08-23T09:20:41Z)
GIFD: A Generative Gradient Inversion Method with Feature Domain Optimization [52.55628139825667]
Federated Learning (FL) has emerged as a promising distributed machine learning framework to preserve clients' privacy. Recent studies find that an attacker can invert the shared gradients and recover sensitive data against an FL system by leveraging pre-trained generative adversarial networks (GAN) as prior knowledge. We propose textbfGradient textbfInversion over textbfFeature textbfDomains (GIFD), which disassembles the GAN model and searches the feature domains of the intermediate layers.
arXiv Detail & Related papers (2023-08-09T04:34:21Z)
FedLAP-DP: Federated Learning by Sharing Differentially Private Loss Approximations [53.268801169075836]
We propose FedLAP-DP, a novel privacy-preserving approach for federated learning. A formal privacy analysis demonstrates that FedLAP-DP incurs the same privacy costs as typical gradient-sharing schemes. Our approach presents a faster convergence speed compared to typical gradient-sharing methods.
arXiv Detail & Related papers (2023-02-02T12:56:46Z)
Over-the-Air Federated Learning with Privacy Protection via Correlated Additive Perturbations [57.20885629270732]
We consider privacy aspects of wireless federated learning with Over-the-Air (OtA) transmission of gradient updates from multiple users/agents to an edge server. Traditional perturbation-based methods provide privacy protection while sacrificing the training accuracy. In this work, we aim at minimizing privacy leakage to the adversary and the degradation of model accuracy at the edge server.
arXiv Detail & Related papers (2022-10-05T13:13:35Z)
Improving Differentially Private SGD via Randomly Sparsified Gradients [31.295035726077366]
Differentially private gradient observation (DP-SGD) has been widely adopted in deep learning to provide rigorously defined privacy bound compression. We propose an and utilize RS to strengthen communication cost and strengthen privacy bound compression.
arXiv Detail & Related papers (2021-12-01T21:43:34Z)
Bypassing the Ambient Dimension: Private SGD with Gradient Subspace Identification [47.23063195722975]
Differentially private SGD (DP-SGD) is one of the most popular methods for solving differentially private empirical risk minimization (ERM) Due to its noisy perturbation on each gradient update, the error rate of DP-SGD scales with the ambient dimension $p$, the number of parameters in the model. We propose Projected DP-SGD that performs noise reduction by projecting the noisy gradients to a low-dimensional subspace.
arXiv Detail & Related papers (2020-07-07T22:31:01Z)
Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users. An adversary may still be able to infer the private training data by attacking the released model. Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.