On Membership Inference Attacks in Knowledge Distillation
- URL: http://arxiv.org/abs/2505.11837v1
- Date: Sat, 17 May 2025 04:54:26 GMT
- Title: On Membership Inference Attacks in Knowledge Distillation
- Authors: Ziyao Cui, Minxing Zhang, Jian Pei,
- Abstract summary: This paper investigates how knowledge distillation affects model robustness against Membership Inference Attacks (MIAs)<n>We show that while teacher and student models achieve similar overall MIA accuracy, teacher models better protect member data, the primary target of MIA.<n>We propose 5 privacy-preserving distillation methods and demonstrate that they successfully reduce student models' vulnerability to MIA.
- Score: 24.10582361065246
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Nowadays, Large Language Models (LLMs) are trained on huge datasets, some including sensitive information. This poses a serious privacy concern because privacy attacks such as Membership Inference Attacks (MIAs) may detect this sensitive information. While knowledge distillation compresses LLMs into efficient, smaller student models, its impact on privacy remains underexplored. In this paper, we investigate how knowledge distillation affects model robustness against MIA. We focus on two questions. First, how is private data protected in teacher and student models? Second, how can we strengthen privacy preservation against MIAs in knowledge distillation? Through comprehensive experiments, we show that while teacher and student models achieve similar overall MIA accuracy, teacher models better protect member data, the primary target of MIA, whereas student models better protect non-member data. To address this vulnerability in student models, we propose 5 privacy-preserving distillation methods and demonstrate that they successfully reduce student models' vulnerability to MIA, with ensembling further stabilizing the robustness, offering a reliable approach for distilling more secure and efficient student models. Our implementation source code is available at https://github.com/richardcui18/MIA_in_KD.
Related papers
- Recalling The Forgotten Class Memberships: Unlearned Models Can Be Noisy Labelers to Leak Privacy [13.702759117522447]
Current limited research on Machine Unlearning (MU) attacks requires access to original models containing privacy data.<n>We propose an innovative study on recalling the forgotten class memberships from unlearned models without requiring access to the original one.<n>Our study and evaluation have established a benchmark for future research on MU vulnerabilities.
arXiv Detail & Related papers (2025-06-24T10:21:10Z) - Membership Inference Attacks fueled by Few-Short Learning to detect privacy leakage tackling data integrity [7.8973037023478785]
Deep learning models memorize parts of their training data, creating a privacy leakage.<n>We propose a Few-Shot learning based MIA, coined as the FeS-MIA model, which eases the evaluation of the privacy breach of a deep learning model.<n>We also propose an interpretable quantitative and qualitative measure of privacy, referred to as Log-MIA measure.
arXiv Detail & Related papers (2025-03-12T13:09:43Z) - Membership Inference Attack Should Move On to Distributional Statistics for Distilled Generative Models [31.834967019893227]
To detect unauthorized data usage in training large-scale generative models, membership inference attacks (MIAs) have proven effective.<n>We find that standard MIAs fail against distilled generative models (i.e., student models) that are increasingly deployed in practice for efficiency.<n>We propose three principles of distribution-based MIAs for detecting unauthorized training data through distilled generative models.
arXiv Detail & Related papers (2025-02-05T08:11:23Z) - Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack.
When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model.
Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - FedMIA: An Effective Membership Inference Attack Exploiting "All for One" Principle in Federated Learning [17.141646895576145]
Federated Learning (FL) is a promising approach for training machine learning models on decentralized data.<n>Membership Inference Attacks (MIAs) aim to determine whether a specific data point belongs to a target client's training set.<n>We introduce a three-step Membership Inference Attack (MIA) method, called FedMIA, which follows the "all for one"--leveraging updates from all clients across multiple communication rounds to enhance MIA effectiveness.
arXiv Detail & Related papers (2024-02-09T09:58:35Z) - Can Sensitive Information Be Deleted From LLMs? Objectives for Defending
Against Extraction Attacks [73.53327403684676]
We propose an attack-and-defense framework for studying the task of deleting sensitive information directly from model weights.
We study direct edits to model weights because this approach should guarantee that particular deleted information is never extracted by future prompt attacks.
We show that even state-of-the-art model editing methods such as ROME struggle to truly delete factual information from models like GPT-J, as our whitebox and blackbox attacks can recover "deleted" information from an edited model 38% of the time.
arXiv Detail & Related papers (2023-09-29T17:12:43Z) - Students Parrot Their Teachers: Membership Inference on Model
Distillation [54.392069096234074]
We study the privacy provided by knowledge distillation to both the teacher and student training sets.
Our attacks are strongest when student and teacher sets are similar, or when the attacker can poison the teacher set.
arXiv Detail & Related papers (2023-03-06T19:16:23Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - On the Privacy Effect of Data Enhancement via the Lens of Memorization [20.63044895680223]
We propose to investigate privacy from a new perspective called memorization.
Through the lens of memorization, we find that previously deployed MIAs produce misleading results as they are less likely to identify samples with higher privacy risks.
We demonstrate that the generalization gap and privacy leakage are less correlated than those of the previous results.
arXiv Detail & Related papers (2022-08-17T13:02:17Z) - RelaxLoss: Defending Membership Inference Attacks without Losing Utility [68.48117818874155]
We propose a novel training framework based on a relaxed loss with a more achievable learning target.
RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead.
Our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs.
arXiv Detail & Related papers (2022-07-12T19:34:47Z) - Knowledge Cross-Distillation for Membership Privacy [0.9087641068861045]
A membership inference attack (MIA) poses privacy risks on the training data of a machine learning model.
We propose a novel defense against MIAs using knowledge distillation without requiring public data.
arXiv Detail & Related papers (2021-11-02T04:16:08Z) - Differentially Private Deep Learning with Smooth Sensitivity [144.31324628007403]
We study privacy concerns through the lens of differential privacy.
In this framework, privacy guarantees are generally obtained by perturbing models in such a way that specifics of data used to train the model are made ambiguous.
One of the most important techniques used in previous works involves an ensemble of teacher models, which return information to a student based on a noisy voting procedure.
In this work, we propose a novel voting mechanism with smooth sensitivity, which we call Immutable Noisy ArgMax, that, under certain conditions, can bear very large random noising from the teacher without affecting the useful information transferred to the student
arXiv Detail & Related papers (2020-03-01T15:38:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.