De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts
- URL: http://arxiv.org/abs/2403.19539v1
- Date: Thu, 28 Mar 2024 16:13:22 GMT
- Title: De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts
- Authors: Yuzheng Wang, Dingkang Yang, Zhaoyu Chen, Yang Liu, Siao Liu, Wenqiang Zhang, Lihua Zhang, Lizhe Qi,
- Abstract summary: Data-Free Knowledge Distillation (DFKD) is a promising task to train high-performance small models to enhance actual deployment without relying on the original training data.
Existing methods commonly avoid relying on private data by utilizing synthetic or sampled data.
This paper proposes a novel perspective with causal inference to disentangle the student models from the impact of such shifts.
- Score: 32.1016787150064
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data-Free Knowledge Distillation (DFKD) is a promising task to train high-performance small models to enhance actual deployment without relying on the original training data. Existing methods commonly avoid relying on private data by utilizing synthetic or sampled data. However, a long-overlooked issue is that the severe distribution shifts between their substitution and original data, which manifests as huge differences in the quality of images and class proportions. The harmful shifts are essentially the confounder that significantly causes performance bottlenecks. To tackle the issue, this paper proposes a novel perspective with causal inference to disentangle the student models from the impact of such shifts. By designing a customized causal graph, we first reveal the causalities among the variables in the DFKD task. Subsequently, we propose a Knowledge Distillation Causal Intervention (KDCI) framework based on the backdoor adjustment to de-confound the confounder. KDCI can be flexibly combined with most existing state-of-the-art baselines. Experiments in combination with six representative DFKD methods demonstrate the effectiveness of our KDCI, which can obviously help existing methods under almost all settings, \textit{e.g.}, improving the baseline by up to 15.54\% accuracy on the CIFAR-100 dataset.
Related papers
- Towards Effective Data-Free Knowledge Distillation via Diverse Diffusion Augmentation [20.556083321381514]
Data-free knowledge distillation (DFKD) has emerged as a pivotal technique in the domain of model compression.
This paper introduces an innovative approach to DFKD through diverse diffusion augmentation (DDA)
Comprehensive experiments conducted on CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets showcase the superior performance of our method.
arXiv Detail & Related papers (2024-10-23T07:01:16Z) - Condensed Sample-Guided Model Inversion for Knowledge Distillation [42.91823325342862]
Knowledge distillation (KD) is a key element in neural network compression that allows knowledge transfer from a pre-trained teacher model to a more compact student model.
KD relies on access to the training dataset, which may not always be fully available due to privacy concerns or logistical issues related to the size of the data.
In this paper, we consider condensed samples as a form of supplementary information, and introduce a method for using them to better approximate the target data distribution.
arXiv Detail & Related papers (2024-08-25T14:43:27Z) - PairCFR: Enhancing Model Training on Paired Counterfactually Augmented Data through Contrastive Learning [49.60634126342945]
Counterfactually Augmented Data (CAD) involves creating new data samples by applying minimal yet sufficient modifications to flip the label of existing data samples to other classes.
Recent research reveals that training with CAD may lead models to overly focus on modified features while ignoring other important contextual information.
We employ contrastive learning to promote global feature alignment in addition to learning counterfactual clues.
arXiv Detail & Related papers (2024-06-09T07:29:55Z) - Robustness-Reinforced Knowledge Distillation with Correlation Distance
and Network Pruning [3.1423836318272773]
Knowledge distillation (KD) improves the performance of efficient and lightweight models.
Most existing KD techniques rely on Kullback-Leibler (KL) divergence.
We propose a Robustness-Reinforced Knowledge Distillation (R2KD) that leverages correlation distance and network pruning.
arXiv Detail & Related papers (2023-11-23T11:34:48Z) - Sampling to Distill: Knowledge Transfer from Open-World Data [28.74835717488114]
We propose a novel Open-world Data Sampling Distillation (ODSD) method for the Data-Free Knowledge Distillation (DFKD) task without the redundant generation process.
First, we try to sample open-world data close to the original data's distribution by an adaptive sampling module.
Then, we build structured relationships of multiple data examples to exploit data knowledge through the student model itself and the teacher's structured representation.
arXiv Detail & Related papers (2023-07-31T12:05:55Z) - Exploring Inconsistent Knowledge Distillation for Object Detection with
Data Augmentation [66.25738680429463]
Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model.
We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions.
Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
arXiv Detail & Related papers (2022-09-20T16:36:28Z) - Prompting to Distill: Boosting Data-Free Knowledge Distillation via
Reinforced Prompt [52.6946016535059]
Data-free knowledge distillation (DFKD) conducts knowledge distillation via eliminating the dependence of original training data.
We propose a prompt-based method, termed as PromptDFD, that allows us to take advantage of learned language priors.
As shown in our experiments, the proposed method substantially improves the synthesis quality and achieves considerable improvements on distillation performance.
arXiv Detail & Related papers (2022-05-16T08:56:53Z) - Federated Causal Discovery [74.37739054932733]
This paper develops a gradient-based learning framework named DAG-Shared Federated Causal Discovery (DS-FCD)
It can learn the causal graph without directly touching local data and naturally handle the data heterogeneity.
Extensive experiments on both synthetic and real-world datasets verify the efficacy of the proposed method.
arXiv Detail & Related papers (2021-12-07T08:04:12Z) - EvDistill: Asynchronous Events to End-task Learning via Bidirectional
Reconstruction-guided Cross-modal Knowledge Distillation [61.33010904301476]
Event cameras sense per-pixel intensity changes and produce asynchronous event streams with high dynamic range and less motion blur.
We propose a novel approach, called bfEvDistill, to learn a student network on the unlabeled and unpaired event data.
We show that EvDistill achieves significantly better results than the prior works and KD with only events and APS frames.
arXiv Detail & Related papers (2021-11-24T08:48:16Z) - Robustness and Diversity Seeking Data-Free Knowledge Distillation [15.768480007698876]
We propose robustness and diversity seeking data-free KD (RDSKD) in this paper.
With MNIST, CIFAR-10, and SVHN datasets, our experiments show that RDSKD achieves higher accuracy with more robustness over different hyper parameter settings.
arXiv Detail & Related papers (2020-11-07T10:57:53Z) - MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability.
We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error.
Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.