Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration
- URL: http://arxiv.org/abs/2412.08939v2
- Date: Tue, 17 Dec 2024 06:30:00 GMT
- Title: Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration
- Authors: Yunshuai Zhou, Junbo Qiao, Jincheng Liao, Wei Li, Simiao Li, Jiao Xie, Yunhang Shen, Jie Hu, Shaohui Lin,
- Abstract summary: We propose a novel dynamic contrastive knowledge distillation (DCKD) framework for image restoration.
Specifically, we introduce dynamic contrastive regularization to perceive the student's learning state.
We also propose a distribution mapping module to extract and align the pixel-level category distribution of the teacher and student models.
- Score: 17.27061613884289
- License:
- Abstract: Knowledge distillation (KD) is a valuable yet challenging approach that enhances a compact student network by learning from a high-performance but cumbersome teacher model. However, previous KD methods for image restoration overlook the state of the student during the distillation, adopting a fixed solution space that limits the capability of KD. Additionally, relying solely on L1-type loss struggles to leverage the distribution information of images. In this work, we propose a novel dynamic contrastive knowledge distillation (DCKD) framework for image restoration. Specifically, we introduce dynamic contrastive regularization to perceive the student's learning state and dynamically adjust the distilled solution space using contrastive learning. Additionally, we also propose a distribution mapping module to extract and align the pixel-level category distribution of the teacher and student models. Note that the proposed DCKD is a structure-agnostic distillation framework, which can adapt to different backbones and can be combined with methods that optimize upper-bound constraints to further enhance model performance. Extensive experiments demonstrate that DCKD significantly outperforms the state-of-the-art KD methods across various image restoration tasks and backbones.
Related papers
- One Step Diffusion-based Super-Resolution with Time-Aware Distillation [60.262651082672235]
Diffusion-based image super-resolution (SR) methods have shown promise in reconstructing high-resolution images with fine details from low-resolution counterparts.
Recent techniques have been devised to enhance the sampling efficiency of diffusion-based SR models via knowledge distillation.
We propose a time-aware diffusion distillation method, named TAD-SR, to accomplish effective and efficient image super-resolution.
arXiv Detail & Related papers (2024-08-14T11:47:22Z) - Discriminative and Consistent Representation Distillation [6.24302896438145]
Discriminative and Consistent Distillation (DCD)
DCD employs a contrastive loss along with a consistency regularization to minimize the discrepancy between the distributions of teacher and student representations.
Our method introduces learnable temperature and bias parameters that adapt during training to balance these complementary objectives.
arXiv Detail & Related papers (2024-07-16T14:53:35Z) - Relative Difficulty Distillation for Semantic Segmentation [54.76143187709987]
We propose a pixel-level KD paradigm for semantic segmentation named Relative Difficulty Distillation (RDD)
RDD allows the teacher network to provide effective guidance on learning focus without additional optimization goals.
Our research showcases that RDD can integrate with existing KD methods to improve their upper performance bound.
arXiv Detail & Related papers (2024-07-04T08:08:25Z) - MTKD: Multi-Teacher Knowledge Distillation for Image Super-Resolution [6.983043882738687]
We propose a novel Multi-Teacher Knowledge Distillation (MTKD) framework specifically for image super-resolution.
It exploits the advantages of multiple teachers by combining and enhancing the outputs of these teacher models.
We fully evaluate the effectiveness of the proposed method by comparing it to five commonly used KD methods for image super-resolution.
arXiv Detail & Related papers (2024-04-15T08:32:41Z) - Revisiting Knowledge Distillation for Autoregressive Language Models [88.80146574509195]
We propose a simple yet effective adaptive teaching approach (ATKD) to improve the knowledge distillation (KD)
The core of ATKD is to reduce rote learning and make teaching more diverse and flexible.
Experiments on 8 LM tasks show that, with the help of ATKD, various baseline KD methods can achieve consistent and significant performance gains.
arXiv Detail & Related papers (2024-02-19T07:01:10Z) - Data Upcycling Knowledge Distillation for Image Super-Resolution [25.753554952896096]
Knowledge distillation (KD) compresses deep neural networks by transferring task-related knowledge from pre-trained teacher models to compact student models.
We present the Data Upcycling Knowledge Distillation (DUKD) to transfer the teacher model's knowledge to the student model through the upcycled in-domain data derived from training data.
arXiv Detail & Related papers (2023-09-25T14:13:26Z) - Knowledge Diffusion for Distillation [53.908314960324915]
The representation gap between teacher and student is an emerging topic in knowledge distillation (KD)
We state that the essence of these methods is to discard the noisy information and distill the valuable information in the feature.
We propose a novel KD method dubbed DiffKD, to explicitly denoise and match features using diffusion models.
arXiv Detail & Related papers (2023-05-25T04:49:34Z) - BD-KD: Balancing the Divergences for Online Knowledge Distillation [11.874952582465601]
We introduce BD-KD (Balanced Divergence Knowledge Distillation), a framework for logit-based online KD.
BD-KD enhances both accuracy and model calibration simultaneously, eliminating the need for post-hoc recalibration techniques.
Our method encourages student-centered training by adjusting the conventional online distillation loss on both the student and teacher losses.
arXiv Detail & Related papers (2022-12-25T22:27:32Z) - Dense Depth Distillation with Out-of-Distribution Simulated Images [30.79756881887895]
We study data-free knowledge distillation (KD) for monocular depth estimation (MDE)
KD learns a lightweight model for real-world depth perception tasks by compressing it from a trained teacher model while lacking training data in the target domain.
We show that our method outperforms the baseline KD by a good margin and even slightly better performance with as few as 1/6 of training images.
arXiv Detail & Related papers (2022-08-26T07:10:01Z) - Aligning Logits Generatively for Principled Black-Box Knowledge Distillation [49.43567344782207]
Black-Box Knowledge Distillation (B2KD) is a formulated problem for cloud-to-edge model compression with invisible data and models hosted on the server.
We formalize a two-step workflow consisting of deprivatization and distillation.
We propose a new method Mapping-Emulation KD (MEKD) that distills a black-box cumbersome model into a lightweight one.
arXiv Detail & Related papers (2022-05-21T02:38:16Z) - Categorical Relation-Preserving Contrastive Knowledge Distillation for
Medical Image Classification [75.27973258196934]
We propose a novel Categorical Relation-preserving Contrastive Knowledge Distillation (CRCKD) algorithm, which takes the commonly used mean-teacher model as the supervisor.
With this regularization, the feature distribution of the student model shows higher intra-class similarity and inter-class variance.
With the contribution of the CCD and CRP, our CRCKD algorithm can distill the relational knowledge more comprehensively.
arXiv Detail & Related papers (2021-07-07T13:56:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.