Revisiting Knowledge Distillation under Distribution Shift
- URL: http://arxiv.org/abs/2312.16242v2
- Date: Sun, 7 Jan 2024 08:52:37 GMT
- Title: Revisiting Knowledge Distillation under Distribution Shift
- Authors: Songming Zhang and Ziyu Lyu and Xiaofeng Chen
- Abstract summary: We study the mechanism of knowledge distillation against distribution shift.
We propose a unified and systematic framework to benchmark knowledge distillation against two general distributional shifts.
We reveal intriguing observations of poor teaching performance under distribution shifts.
- Score: 7.796685962570969
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge distillation transfers knowledge from large models into small
models, and has recently made remarkable achievements. However, few studies has
investigated the mechanism of knowledge distillation against distribution
shift. Distribution shift refers to the data distribution drifts between
training and testing phases. In this paper, we reconsider the paradigm of
knowledge distillation by reformulating the objective function in shift
situations. Under the real scenarios, we propose a unified and systematic
framework to benchmark knowledge distillation against two general
distributional shifts including diversity and correlation shift. The evaluation
benchmark covers more than 30 methods from algorithmic, data-driven, and
optimization perspectives for five benchmark datasets. Overall, we conduct
extensive experiments on the student model. We reveal intriguing observations
of poor teaching performance under distribution shifts; in particular, complex
algorithms and data augmentation offer limited gains in many cases.
Related papers
- Navigating Semantic Drift in Task-Agnostic Class-Incremental Learning [51.177789437682954]
Class-incremental learning (CIL) seeks to enable a model to sequentially learn new classes while retaining knowledge of previously learned ones.
Balancing flexibility and stability remains a significant challenge, particularly when the task ID is unknown.
We propose a novel semantic drift calibration method that incorporates mean shift compensation and covariance calibration.
arXiv Detail & Related papers (2025-02-11T13:57:30Z) - On Distilling the Displacement Knowledge for Few-Shot Class-Incremental Learning [17.819582979803286]
Few-shot Class-Incremental Learning (FSCIL) addresses the challenges of evolving data distributions and the difficulty of data acquisition in real-world scenarios.
To counteract the catastrophic forgetting typically encountered in FSCIL, knowledge distillation is employed as a way to maintain the knowledge from learned data distribution.
arXiv Detail & Related papers (2024-12-15T02:10:18Z) - Harnessing the Power of Vicinity-Informed Analysis for Classification under Covariate Shift [9.530897053573186]
This paper introduces a novel dissimilarity measure that utilizes vicinity information, i.e., the local structure of data points.
We characterize the excess error using the proposed measure and demonstrate faster or competitive convergence rates compared to previous techniques.
arXiv Detail & Related papers (2024-05-27T07:55:27Z) - Leveraging Diffusion Disentangled Representations to Mitigate Shortcuts
in Underspecified Visual Tasks [92.32670915472099]
We propose an ensemble diversification framework exploiting the generation of synthetic counterfactuals using Diffusion Probabilistic Models (DPMs)
We show that diffusion-guided diversification can lead models to avert attention from shortcut cues, achieving ensemble diversity performance comparable to previous methods requiring additional data collection.
arXiv Detail & Related papers (2023-10-03T17:37:52Z) - Towards Effective Collaborative Learning in Long-Tailed Recognition [16.202524991074416]
Real-world data usually suffers from severe class imbalance and long-tailed distributions, where minority classes are significantly underrepresented.
Recent research prefers to utilize multi-expert architectures to mitigate the model uncertainty on the minority.
In this paper, we observe that the knowledge transfer between experts is imbalanced in terms of class distribution, which results in limited performance improvement of the minority classes.
arXiv Detail & Related papers (2023-05-05T09:16:06Z) - Variational Distillation for Multi-View Learning [104.17551354374821]
We design several variational information bottlenecks to exploit two key characteristics for multi-view representation learning.
Under rigorously theoretical guarantee, our approach enables IB to grasp the intrinsic correlation between observations and semantic labels.
arXiv Detail & Related papers (2022-06-20T03:09:46Z) - An Empirical Study on Distribution Shift Robustness From the Perspective
of Pre-Training and Data Augmentation [91.62129090006745]
This paper studies the distribution shift problem from the perspective of pre-training and data augmentation.
We provide the first comprehensive empirical study focusing on pre-training and data augmentation.
arXiv Detail & Related papers (2022-05-25T13:04:53Z) - Robust Generalization despite Distribution Shift via Minimum
Discriminating Information [46.164498176119665]
We introduce a modeling framework where, in addition to training data, we have partial structural knowledge of the shifted test distribution.
We employ the principle of minimum discriminating information to embed the available prior knowledge.
We obtain explicit generalization bounds with respect to the unknown shifted distribution.
arXiv Detail & Related papers (2021-06-08T15:25:35Z) - Learning Diverse Representations for Fast Adaptation to Distribution
Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task.
We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z) - Learning From Multiple Experts: Self-paced Knowledge Distillation for
Long-tailed Classification [106.08067870620218]
We propose a self-paced knowledge distillation framework, termed Learning From Multiple Experts (LFME)
We refer to these models as 'Experts', and the proposed LFME framework aggregates the knowledge from multiple 'Experts' to learn a unified student model.
We conduct extensive experiments and demonstrate that our method is able to achieve superior performances compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-01-06T12:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.