Heterogeneous Generative Knowledge Distillation with Masked Image
Modeling
- URL: http://arxiv.org/abs/2309.09571v2
- Date: Thu, 11 Jan 2024 14:07:11 GMT
- Title: Heterogeneous Generative Knowledge Distillation with Masked Image
Modeling
- Authors: Ziming Wang, Shumin Han, Xiaodi Wang, Jing Hao, Xianbin Cao, Baochang
Zhang
- Abstract summary: Masked image modeling (MIM) methods achieve great success in various visual tasks but remain largely unexplored in knowledge distillation for heterogeneous deep models.
We develop the first Heterogeneous Generative Knowledge Distillation (H-GKD) based on MIM, which can efficiently transfer knowledge from large Transformer models to small CNN-based models in a generative self-supervised fashion.
Our method is a simple yet effective learning paradigm to learn the visual representation and distribution of data from heterogeneous teacher models.
- Score: 33.95780732124864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Small CNN-based models usually require transferring knowledge from a large
model before they are deployed in computationally resource-limited edge
devices. Masked image modeling (MIM) methods achieve great success in various
visual tasks but remain largely unexplored in knowledge distillation for
heterogeneous deep models. The reason is mainly due to the significant
discrepancy between the Transformer-based large model and the CNN-based small
network. In this paper, we develop the first Heterogeneous Generative Knowledge
Distillation (H-GKD) based on MIM, which can efficiently transfer knowledge
from large Transformer models to small CNN-based models in a generative
self-supervised fashion. Our method builds a bridge between Transformer-based
models and CNNs by training a UNet-style student with sparse convolution, which
can effectively mimic the visual representation inferred by a teacher over
masked modeling. Our method is a simple yet effective learning paradigm to
learn the visual representation and distribution of data from heterogeneous
teacher models, which can be pre-trained using advanced generative methods.
Extensive experiments show that it adapts well to various models and sizes,
consistently achieving state-of-the-art performance in image classification,
object detection, and semantic segmentation tasks. For example, in the Imagenet
1K dataset, H-GKD improves the accuracy of Resnet50 (sparse) from 76.98% to
80.01%.
Related papers
- BEND: Bagging Deep Learning Training Based on Efficient Neural Network Diffusion [56.9358325168226]
We propose a Bagging deep learning training algorithm based on Efficient Neural network Diffusion (BEND)
Our approach is simple but effective, first using multiple trained model weights and biases as inputs to train autoencoder and latent diffusion model.
Our proposed BEND algorithm can consistently outperform the mean and median accuracies of both the original trained model and the diffused model.
arXiv Detail & Related papers (2024-03-23T08:40:38Z) - Fisher Mask Nodes for Language Model Merging [0.0]
We introduce a novel model merging method for Transformers, combining insights from previous work in Fisher-weighted averaging and the use of Fisher information in model pruning.
Our method exhibits a regular and significant performance increase across various models in the BERT family, outperforming full-scale Fisher-weighted averaging in a fraction of the computational cost.
arXiv Detail & Related papers (2024-03-14T21:52:26Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Advancing Plain Vision Transformer Towards Remote Sensing Foundation
Model [97.9548609175831]
We resort to plain vision transformers with about 100 million parameters and make the first attempt to propose large vision models customized for remote sensing tasks.
Specifically, to handle the large image size and objects of various orientations in RS images, we propose a new rotated varied-size window attention.
Experiments on detection tasks demonstrate the superiority of our model over all state-of-the-art models, achieving 81.16% mAP on the DOTA-V1.0 dataset.
arXiv Detail & Related papers (2022-08-08T09:08:40Z) - Revisiting Classifier: Transferring Vision-Language Models for Video
Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research.
In this study, we focus on transferring knowledge for video classification tasks.
We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z) - Meta Internal Learning [88.68276505511922]
Internal learning for single-image generation is a framework, where a generator is trained to produce novel images based on a single image.
We propose a meta-learning approach that enables training over a collection of images, in order to model the internal statistics of the sample image more effectively.
Our results show that the models obtained are as suitable as single-image GANs for many common image applications.
arXiv Detail & Related papers (2021-10-06T16:27:38Z) - Pre-Trained Image Processing Transformer [95.93031793337613]
We develop a new pre-trained model, namely, image processing transformer (IPT)
We present to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs.
IPT model is trained on these images with multi-heads and multi-tails.
arXiv Detail & Related papers (2020-12-01T09:42:46Z) - Multi-task pre-training of deep neural networks for digital pathology [8.74883469030132]
We first assemble and transform many digital pathology datasets into a pool of 22 classification tasks and almost 900k images.
We show that our models used as feature extractors either improve significantly over ImageNet pre-trained models or provide comparable performance.
arXiv Detail & Related papers (2020-05-05T08:50:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.