How to Backdoor the Knowledge Distillation
- URL: http://arxiv.org/abs/2504.21323v1
- Date: Wed, 30 Apr 2025 05:19:23 GMT
- Title: How to Backdoor the Knowledge Distillation
- Authors: Chen Wu, Qian Ma, Prasenjit Mitra, Sencun Zhu,
- Abstract summary: We introduce a novel attack methodology that strategically poisons the distillation dataset with adversarial examples embedded with backdoor triggers.<n>This technique allows for the stealthy compromise of the student model while maintaining the integrity of the teacher model.<n>Our findings reveal previously unrecognized vulnerabilities and pave the way for future research aimed at securing knowledge distillation processes.
- Score: 10.478504819079548
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge distillation has become a cornerstone in modern machine learning systems, celebrated for its ability to transfer knowledge from a large, complex teacher model to a more efficient student model. Traditionally, this process is regarded as secure, assuming the teacher model is clean. This belief stems from conventional backdoor attacks relying on poisoned training data with backdoor triggers and attacker-chosen labels, which are not involved in the distillation process. Instead, knowledge distillation uses the outputs of a clean teacher model to guide the student model, inherently preventing recognition or response to backdoor triggers as intended by an attacker. In this paper, we challenge this assumption by introducing a novel attack methodology that strategically poisons the distillation dataset with adversarial examples embedded with backdoor triggers. This technique allows for the stealthy compromise of the student model while maintaining the integrity of the teacher model. Our innovative approach represents the first successful exploitation of vulnerabilities within the knowledge distillation process using clean teacher models. Through extensive experiments conducted across various datasets and attack settings, we demonstrate the robustness, stealthiness, and effectiveness of our method. Our findings reveal previously unrecognized vulnerabilities and pave the way for future research aimed at securing knowledge distillation processes against backdoor attacks.
Related papers
- Model Mimic Attack: Knowledge Distillation for Provably Transferable Adversarial Examples [1.1820990818670631]
This work is the first to provide provable guarantees on the success of knowledge distillation-based attack on classification neural networks.
We prove that if the student model has enough learning capabilities, the attack on the teacher model is guaranteed to be found within the finite number of distillation iterations.
arXiv Detail & Related papers (2024-10-21T11:06:56Z) - Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning.
This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities.
In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z) - Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning [49.242828934501986]
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features.
backdoor attacks subtly embed malicious behaviors within the model during training.
We introduce an innovative token-based localized forgetting training regime.
arXiv Detail & Related papers (2024-03-24T18:33:15Z) - Rethinking Backdoor Attacks on Dataset Distillation: A Kernel Method
Perspective [65.70799289211868]
We introduce two new theory-driven trigger pattern generation methods specialized for dataset distillation.
We show that our optimization-based trigger design framework informs effective backdoor attacks on dataset distillation.
arXiv Detail & Related papers (2023-11-28T09:53:05Z) - Learning the Wrong Lessons: Inserting Trojans During Knowledge
Distillation [68.8204255655161]
Trojan attacks have contemporaneously gained significant prominence, revealing fundamental vulnerabilities in deep learning models.
We seek to exploit the unlabelled data knowledge distillation process to embed Trojans in a student model without introducing conspicuous behavior in the teacher.
We devise a Trojan attack that effectively reduces student accuracy, does not alter teacher performance, and is efficiently constructible in practice.
arXiv Detail & Related papers (2023-03-09T21:37:50Z) - Students Parrot Their Teachers: Membership Inference on Model
Distillation [54.392069096234074]
We study the privacy provided by knowledge distillation to both the teacher and student training sets.
Our attacks are strongest when student and teacher sets are similar, or when the attacker can poison the teacher set.
arXiv Detail & Related papers (2023-03-06T19:16:23Z) - Distilling the Undistillable: Learning from a Nasty Teacher [30.0248670422039]
We develop efficient methodologies to increase the learning from Nasty Teacher by upto 68.63% on standard datasets.
We also explore an improvised defense method based on our insights of stealing.
Our detailed set of experiments and ablations on diverse models/settings demonstrate the efficacy of our approach.
arXiv Detail & Related papers (2022-10-21T04:35:44Z) - On the benefits of knowledge distillation for adversarial robustness [53.41196727255314]
We show that knowledge distillation can be used directly to boost the performance of state-of-the-art models in adversarial robustness.
We present Adversarial Knowledge Distillation (AKD), a new framework to improve a model's robust performance.
arXiv Detail & Related papers (2022-03-14T15:02:13Z) - Teacher Model Fingerprinting Attacks Against Transfer Learning [23.224444604615123]
We present the first comprehensive investigation of the teacher model exposure threat in the transfer learning context.
We propose a teacher model fingerprinting attack to infer the origin of a student model it transfers from.
We show that our attack can accurately identify the model origin with few probing queries.
arXiv Detail & Related papers (2021-06-23T15:52:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.