Swing Distillation: A Privacy-Preserving Knowledge Distillation
Framework
- URL: http://arxiv.org/abs/2212.08349v1
- Date: Fri, 16 Dec 2022 08:57:18 GMT
- Title: Swing Distillation: A Privacy-Preserving Knowledge Distillation
Framework
- Authors: Junzhuo Li, Xinwei Wu, Weilong Dong, Shuangzhi Wu, Chao Bian and Deyi
Xiong
- Abstract summary: We propose a novel knowledge distillation method, which can effectively protect the private information of the teacher model from flowing to the student model.
Experiments on multiple datasets and tasks demonstrate that the proposed swing distillation can significantly reduce (by over 80% in terms of canary exposure) the risk of privacy leakage.
- Score: 38.68736962054861
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge distillation (KD) has been widely used for model compression and
knowledge transfer. Typically, a big teacher model trained on sufficient data
transfers knowledge to a small student model. However, despite the success of
KD, little effort has been made to study whether KD leaks the training data of
the teacher model. In this paper, we experimentally reveal that KD suffers from
the risk of privacy leakage. To alleviate this issue, we propose a novel
knowledge distillation method, swing distillation, which can effectively
protect the private information of the teacher model from flowing to the
student model. In our framework, the temperature coefficient is dynamically and
adaptively adjusted according to the degree of private information contained in
the data, rather than a predefined constant hyperparameter. It assigns
different temperatures to tokens according to the likelihood that a token in a
position contains private information. In addition, we inject noise into soft
targets provided to the student model, in order to avoid unshielded knowledge
transfer. Experiments on multiple datasets and tasks demonstrate that the
proposed swing distillation can significantly reduce (by over 80% in terms of
canary exposure) the risk of privacy leakage in comparison to KD with
competitive or better performance. Furthermore, swing distillation is robust
against the increasing privacy budget.
Related papers
- Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling [81.00825302340984]
We introduce Speculative Knowledge Distillation (SKD) to generate high-quality training data on-the-fly.
In SKD, the student proposes tokens, and the teacher replaces poorly ranked ones based on its own distribution.
We evaluate SKD on various text generation tasks, including translation, summarization, math, and instruction following.
arXiv Detail & Related papers (2024-10-15T06:51:25Z) - Dynamic Temperature Knowledge Distillation [9.6046915661065]
Temperature plays a pivotal role in moderating label softness in the realm of knowledge distillation (KD)
Traditional approaches often employ a static temperature throughout the KD process.
We propose Dynamic Temperature Knowledge Distillation (DTKD) which introduces a dynamic, cooperative temperature control for both teacher and student models simultaneously.
arXiv Detail & Related papers (2024-04-19T08:40:52Z) - Robustness-Reinforced Knowledge Distillation with Correlation Distance
and Network Pruning [3.1423836318272773]
Knowledge distillation (KD) improves the performance of efficient and lightweight models.
Most existing KD techniques rely on Kullback-Leibler (KL) divergence.
We propose a Robustness-Reinforced Knowledge Distillation (R2KD) that leverages correlation distance and network pruning.
arXiv Detail & Related papers (2023-11-23T11:34:48Z) - Students Parrot Their Teachers: Membership Inference on Model
Distillation [54.392069096234074]
We study the privacy provided by knowledge distillation to both the teacher and student training sets.
Our attacks are strongest when student and teacher sets are similar, or when the attacker can poison the teacher set.
arXiv Detail & Related papers (2023-03-06T19:16:23Z) - Unbiased Knowledge Distillation for Recommendation [66.82575287129728]
Knowledge distillation (KD) has been applied in recommender systems (RS) to reduce inference latency.
Traditional solutions first train a full teacher model from the training data, and then transfer its knowledge to supervise the learning of a compact student model.
We find such a standard distillation paradigm would incur serious bias issue -- popular items are more heavily recommended after the distillation.
arXiv Detail & Related papers (2022-11-27T05:14:03Z) - Exploring Inconsistent Knowledge Distillation for Object Detection with
Data Augmentation [66.25738680429463]
Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model.
We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions.
Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
arXiv Detail & Related papers (2022-09-20T16:36:28Z) - Undistillable: Making A Nasty Teacher That CANNOT teach students [84.6111281091602]
This paper introduces and investigates a concept called Nasty Teacher: a specially trained teacher network that yields nearly the same performance as a normal one.
We propose a simple yet effective algorithm to build the nasty teacher, called self-undermining knowledge distillation.
arXiv Detail & Related papers (2021-05-16T08:41:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.