Related papers: Do Students Debias Like Teachers? On the Distillability of Bias Mitigation Methods

Do Students Debias Like Teachers? On the Distillability of Bias Mitigation Methods

URL: http://arxiv.org/abs/2510.26038v1
Date: Thu, 30 Oct 2025 00:34:16 GMT
Title: Do Students Debias Like Teachers? On the Distillability of Bias Mitigation Methods
Authors: Jiali Cheng, Chirag Agarwal, Hadi Amiri,
Abstract summary: This study investigates the effect of knowledge distillation on the transferability of debiasing'' capabilities.<n>To the best of our knowledge, this is the first study on the effect of KD on debiasing and its interenal mechanism at scale.
Score: 31.111748100296527
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Knowledge distillation (KD) is an effective method for model compression and transferring knowledge between models. However, its effect on model's robustness against spurious correlations that degrade performance on out-of-distribution data remains underexplored. This study investigates the effect of knowledge distillation on the transferability of ``debiasing'' capabilities from teacher models to student models on natural language inference (NLI) and image classification tasks. Through extensive experiments, we illustrate several key findings: (i) overall the debiasing capability of a model is undermined post-KD; (ii) training a debiased model does not benefit from injecting teacher knowledge; (iii) although the overall robustness of a model may remain stable post-distillation, significant variations can occur across different types of biases; and (iv) we pin-point the internal attention pattern and circuit that causes the distinct behavior post-KD. Given the above findings, we propose three effective solutions to improve the distillability of debiasing methods: developing high quality data for augmentation, implementing iterative knowledge distillation, and initializing student models with weights obtained from teacher models. To the best of our knowledge, this is the first study on the effect of KD on debiasing and its interenal mechanism at scale. Our findings provide understandings on how KD works and how to design better debiasing methods.

Related papers

Why Knowledge Distillation Works in Generative Models: A Minimal Working Explanation [50.784080714897776]
Knowledge distillation (KD) is a core component in the training and deployment of modern generative models.<n>We show that KD induces a trade-off between precision and recall in the student model.<n>Our analysis provides a simple and general explanation for the effectiveness of KD in generative modeling.
arXiv Detail & Related papers (2025-05-19T13:39:47Z)
CustomKD: Customizing Large Vision Foundation for Edge Model Improvement via Knowledge Distillation [57.91828170220308]
We propose a knowledge distillation approach, CustomKD, that effectively leverages large vision foundation models (LVFMs) to enhance the performance of edge models.<n>Our simple yet effective CustomKD customizes the well-generalized features inherent in LVFMs to a given student model in order to reduce model discrepancies.
arXiv Detail & Related papers (2025-03-23T23:53:08Z)
Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration [74.09687562334682]
We introduce a novel training data attribution method called Debias and Denoise Attribution (DDA) Our method significantly outperforms existing approaches, achieving an averaged AUC of 91.64%. DDA exhibits strong generality and scalability across various sources and different-scale models like LLaMA2, QWEN2, and Mistral.
arXiv Detail & Related papers (2024-10-02T07:14:26Z)
Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models [62.5501109475725]
Knowledge distillation (KD) is a technique that compresses large teacher models by training smaller student models to mimic them. This paper introduces Online Knowledge Distillation (OKD), where the teacher network integrates small online modules to concurrently train with the student model. OKD achieves or exceeds the performance of leading methods in various model architectures and sizes, reducing training time by up to fourfold.
arXiv Detail & Related papers (2024-09-19T07:05:26Z)
Adaptive Explicit Knowledge Transfer for Knowledge Distillation [17.739979156009696]
We show that the performance of logit-based knowledge distillation can be improved by effectively delivering the probability distribution for the non-target classes from the teacher model. We propose a new loss that enables the student to learn explicit knowledge along with implicit knowledge in an adaptive manner. Experimental results demonstrate that the proposed method, called adaptive explicit knowledge transfer (AEKT) method, achieves improved performance compared to the state-of-the-art KD methods.
arXiv Detail & Related papers (2024-09-03T07:42:59Z)
Improve Knowledge Distillation via Label Revision and Data Selection [37.74822443555646]
This paper proposes to rectify the teacher's inaccurate predictions using the ground truth. In the latter, we introduce a data selection technique to choose suitable training samples to be supervised by the teacher. Experiment results demonstrate the effectiveness of our proposed method, and show that our method can be combined with other distillation approaches.
arXiv Detail & Related papers (2024-04-03T02:41:16Z)
Comparative Knowledge Distillation [102.35425896967791]
Traditional Knowledge Distillation (KD) assumes readily available access to teacher models for frequent inference. We propose Comparative Knowledge Distillation (CKD), which encourages student models to understand the nuanced differences in a teacher model's interpretations of samples. CKD consistently outperforms state of the art data augmentation and KD techniques.
arXiv Detail & Related papers (2023-11-03T21:55:33Z)
On the Impact of Knowledge Distillation for Model Interpretability [22.18694053092722]
Knowledge distillation (KD) enhances the interpretability as well as the accuracy of models. We attribute the improvement in interpretability to the class-similarity information transferred from the teacher to student models. Our research showed that KD models by large models could be used more reliably in various fields.
arXiv Detail & Related papers (2023-05-25T05:35:11Z)
Learning Interpretation with Explainable Knowledge Distillation [28.00216413365036]
Knowledge Distillation (KD) has been considered as a key solution in model compression and acceleration in recent years. We propose a novel explainable knowledge distillation model, called XDistillation, through which both the performance the explanations' information are transferred from the teacher model to the student model. Our experiments shows that models trained by XDistillation outperform those trained by conventional KD methods in term of predictive accuracy and also faithfulness to the teacher models.
arXiv Detail & Related papers (2021-11-12T21:18:06Z)
MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability. We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error. Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z)
Understanding and Improving Knowledge Distillation [13.872105118381938]
Knowledge Distillation (KD) is a model-agnostic technique to improve model quality while having a fixed capacity budget. This paper categorizes teacher's knowledge into three hierarchical levels and study its effects on knowledge distillation.
arXiv Detail & Related papers (2020-02-10T04:21:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.