Related papers: Teacher's pet: understanding and mitigating biases in distillation

Teacher's pet: understanding and mitigating biases in distillation

URL: http://arxiv.org/abs/2106.10494v1
Date: Sat, 19 Jun 2021 13:06:25 GMT
Title: Teacher's pet: understanding and mitigating biases in distillation
Authors: Michal Lukasik and Srinadh Bhojanapalli and Aditya Krishna Menon and Sanjiv Kumar
Abstract summary: Several works have shown that distillation significantly boosts the student's overall performance. However, are these gains uniform across all data subgroups? We show that distillation can harm performance on certain subgroups. We present techniques which soften the teacher influence for subgroups where it is less reliable.
Score: 61.44867470297283
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Knowledge distillation is widely used as a means of improving the performance of a relatively simple student model using the predictions from a complex teacher model. Several works have shown that distillation significantly boosts the student's overall performance; however, are these gains uniform across all data subgroups? In this paper, we show that distillation can harm performance on certain subgroups, e.g., classes with few associated samples. We trace this behaviour to errors made by the teacher distribution being transferred to and amplified by the student model. To mitigate this problem, we present techniques which soften the teacher influence for subgroups where it is less reliable. Experiments on several image classification benchmarks show that these modifications of distillation maintain boost in overall accuracy, while additionally ensuring improvement in subgroup performance.

Related papers

Warmup-Distill: Bridge the Distribution Mismatch between Teacher and Student before Knowledge Distillation [84.38105530043741]
We propose Warmup-Distill, which aligns the distillation of the student to that of the teacher in advance of distillation. Experiments on the seven benchmarks demonstrate that Warmup-Distill could provide a warmup student more suitable for distillation.
arXiv Detail & Related papers (2025-02-17T12:58:12Z)
Adaptive Group Robust Ensemble Knowledge Distillation [6.4989916051093815]
We propose Adaptive Group Robust Ensemble Knowledge Distillation (AGRE-KD) to ensure that the student model receives knowledge beneficial for unknown underrepresented subgroups. Our method selectively chooses teachers whose knowledge would better improve the worst-performing subgroups by upweighting the teachers with gradient directions deviating from the biased model.
arXiv Detail & Related papers (2024-11-22T14:44:51Z)
What is Left After Distillation? How Knowledge Transfer Impacts Fairness and Bias [1.03590082373586]
As many as 41% of the classes are statistically significantly affected by distillation when comparing class-wise accuracy. This study highlights the uneven effects of Knowledge Distillation on certain classes and its potentially significant role in fairness.
arXiv Detail & Related papers (2024-10-10T22:43:00Z)
Logit Standardization in Knowledge Distillation [83.31794439964033]
The assumption of a shared temperature between teacher and student implies a mandatory exact match between their logits in terms of logit range and variance. We propose setting the temperature as the weighted standard deviation of logit and performing a plug-and-play Z-score pre-process of logit standardization. Our pre-process enables student to focus on essential logit relations from teacher rather than requiring a magnitude match, and can improve the performance of existing logit-based distillation methods.
arXiv Detail & Related papers (2024-03-03T07:54:03Z)
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers [49.79405257763856]
This paper focuses on task-agnostic distillation. It produces a compact pre-trained model that can be easily fine-tuned on various tasks with small computational costs and memory footprints. We propose Homotopic Distillation (HomoDistil), a novel task-agnostic distillation approach equipped with iterative pruning.
arXiv Detail & Related papers (2023-02-19T17:37:24Z)
Supervision Complexity and its Role in Knowledge Distillation [65.07910515406209]
We study the generalization behavior of a distilled student. The framework highlights a delicate interplay among the teacher's accuracy, the student's margin with respect to the teacher predictions, and the complexity of the teacher predictions. We demonstrate efficacy of online distillation and validate the theoretical findings on a range of image classification benchmarks and model architectures.
arXiv Detail & Related papers (2023-01-28T16:34:47Z)
Robust Distillation for Worst-class Performance [38.80008602644002]
We develop distillation techniques that are tailored to improve the student's worst-class performance. We show empirically that our robust distillation techniques achieve better worst-class performance. We provide insights into what makes a good teacher when the goal is to train a robust student.
arXiv Detail & Related papers (2022-06-13T21:17:00Z)
Unified and Effective Ensemble Knowledge Distillation [92.67156911466397]
Ensemble knowledge distillation can extract knowledge from multiple teacher models and encode it into a single student model. Many existing methods learn and distill the student model on labeled data only. We propose a unified and effective ensemble knowledge distillation method that distills a single student model from an ensemble of teacher models on both labeled and unlabeled data.
arXiv Detail & Related papers (2022-04-01T16:15:39Z)
Why distillation helps: a statistical perspective [69.90148901064747]
Knowledge distillation is a technique for improving the performance of a simple "student" model. While this simple approach has proven widely effective, a basic question remains unresolved: why does distillation help? We show how distillation complements existing negative mining techniques for extreme multiclass retrieval.
arXiv Detail & Related papers (2020-05-21T01:49:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.