Related papers: Revisiting Knowledge Distillation under Distribution Shift

Revisiting Knowledge Distillation under Distribution Shift

URL: http://arxiv.org/abs/2312.16242v2
Date: Sun, 7 Jan 2024 08:52:37 GMT
Title: Revisiting Knowledge Distillation under Distribution Shift
Authors: Songming Zhang and Ziyu Lyu and Xiaofeng Chen
Abstract summary: We study the mechanism of knowledge distillation against distribution shift. We propose a unified and systematic framework to benchmark knowledge distillation against two general distributional shifts. We reveal intriguing observations of poor teaching performance under distribution shifts.
Score: 7.796685962570969
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Knowledge distillation transfers knowledge from large models into small models, and has recently made remarkable achievements. However, few studies has investigated the mechanism of knowledge distillation against distribution shift. Distribution shift refers to the data distribution drifts between training and testing phases. In this paper, we reconsider the paradigm of knowledge distillation by reformulating the objective function in shift situations. Under the real scenarios, we propose a unified and systematic framework to benchmark knowledge distillation against two general distributional shifts including diversity and correlation shift. The evaluation benchmark covers more than 30 methods from algorithmic, data-driven, and optimization perspectives for five benchmark datasets. Overall, we conduct extensive experiments on the student model. We reveal intriguing observations of poor teaching performance under distribution shifts; in particular, complex algorithms and data augmentation offer limited gains in many cases.

Related papers

Navigating Semantic Drift in Task-Agnostic Class-Incremental Learning [51.177789437682954]
Class-incremental learning (CIL) seeks to enable a model to sequentially learn new classes while retaining knowledge of previously learned ones. Balancing flexibility and stability remains a significant challenge, particularly when the task ID is unknown. We propose a novel semantic drift calibration method that incorporates mean shift compensation and covariance calibration.
arXiv Detail & Related papers (2025-02-11T13:57:30Z)
On Distilling the Displacement Knowledge for Few-Shot Class-Incremental Learning [17.819582979803286]
Few-shot Class-Incremental Learning (FSCIL) addresses the challenges of evolving data distributions and the difficulty of data acquisition in real-world scenarios. To counteract the catastrophic forgetting typically encountered in FSCIL, knowledge distillation is employed as a way to maintain the knowledge from learned data distribution.
arXiv Detail & Related papers (2024-12-15T02:10:18Z)
Harnessing the Power of Vicinity-Informed Analysis for Classification under Covariate Shift [9.530897053573186]
Transfer learning enhances prediction accuracy on a target distribution by leveraging data from a source distribution. This paper introduces a novel dissimilarity measure that utilizes vicinity information, i.e., the local structure of data points. We characterize the excess error using the proposed measure and demonstrate faster or competitive convergence rates compared to previous techniques.
arXiv Detail & Related papers (2024-05-27T07:55:27Z)
Leveraging Diffusion Disentangled Representations to Mitigate Shortcuts in Underspecified Visual Tasks [92.32670915472099]
We propose an ensemble diversification framework exploiting the generation of synthetic counterfactuals using Diffusion Probabilistic Models (DPMs) We show that diffusion-guided diversification can lead models to avert attention from shortcut cues, achieving ensemble diversity performance comparable to previous methods requiring additional data collection.
arXiv Detail & Related papers (2023-10-03T17:37:52Z)
Sharpness & Shift-Aware Self-Supervised Learning [17.978849280772092]
Self-supervised learning aims to extract meaningful features from unlabeled data for further downstream tasks. We develop rigorous theories to realize the factors that implicitly influence the general loss of this classification task. We conduct extensive experiments to verify our theoretical findings and demonstrate that sharpness & shift-aware contrastive learning can remarkably boost the performance.
arXiv Detail & Related papers (2023-05-17T14:42:16Z)
Towards Effective Collaborative Learning in Long-Tailed Recognition [16.202524991074416]
Real-world data usually suffers from severe class imbalance and long-tailed distributions, where minority classes are significantly underrepresented. Recent research prefers to utilize multi-expert architectures to mitigate the model uncertainty on the minority. In this paper, we observe that the knowledge transfer between experts is imbalanced in terms of class distribution, which results in limited performance improvement of the minority classes.
arXiv Detail & Related papers (2023-05-05T09:16:06Z)
Variational Distillation for Multi-View Learning [104.17551354374821]
We design several variational information bottlenecks to exploit two key characteristics for multi-view representation learning. Under rigorously theoretical guarantee, our approach enables IB to grasp the intrinsic correlation between observations and semantic labels.
arXiv Detail & Related papers (2022-06-20T03:09:46Z)
An Empirical Study on Distribution Shift Robustness From the Perspective of Pre-Training and Data Augmentation [91.62129090006745]
This paper studies the distribution shift problem from the perspective of pre-training and data augmentation. We provide the first comprehensive empirical study focusing on pre-training and data augmentation.
arXiv Detail & Related papers (2022-05-25T13:04:53Z)
Robust Generalization despite Distribution Shift via Minimum Discriminating Information [46.164498176119665]
We introduce a modeling framework where, in addition to training data, we have partial structural knowledge of the shifted test distribution. We employ the principle of minimum discriminating information to embed the available prior knowledge. We obtain explicit generalization bounds with respect to the unknown shifted distribution.
arXiv Detail & Related papers (2021-06-08T15:25:35Z)
Unsupervised Transfer Learning for Spatiotemporal Predictive Networks [90.67309545798224]
We study how to transfer knowledge from a zoo of unsupervisedly learned models towards another network. Our motivation is that models are expected to understand complex dynamics from different sources. Our approach yields significant improvements on three benchmarks fortemporal prediction, and benefits the target even from less relevant ones.
arXiv Detail & Related papers (2020-09-24T15:40:55Z)
Learning Diverse Representations for Fast Adaptation to Distribution Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task. We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z)
Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification [106.08067870620218]
We propose a self-paced knowledge distillation framework, termed Learning From Multiple Experts (LFME) We refer to these models as 'Experts', and the proposed LFME framework aggregates the knowledge from multiple 'Experts' to learn a unified student model. We conduct extensive experiments and demonstrate that our method is able to achieve superior performances compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-01-06T12:57:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.