Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN
- URL: http://arxiv.org/abs/2003.09088v1
- Date: Fri, 20 Mar 2020 03:20:52 GMT
- Title: Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN
- Authors: Jingwen Ye, Yixin Ji, Xinchao Wang, Xin Gao, Mingli Song
- Abstract summary: We propose a data-free knowledge amalgamate strategy to craft a well-behaved multi-task student network from multiple single/multi-task teachers.
The proposed method without any training data achieves the surprisingly competitive results, even compared with some full-supervised methods.
- Score: 80.17705319689139
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in deep learning have provided procedures for learning one
network to amalgamate multiple streams of knowledge from the pre-trained
Convolutional Neural Network (CNN) models, thus reduce the annotation cost.
However, almost all existing methods demand massive training data, which may be
unavailable due to privacy or transmission issues. In this paper, we propose a
data-free knowledge amalgamate strategy to craft a well-behaved multi-task
student network from multiple single/multi-task teachers. The main idea is to
construct the group-stack generative adversarial networks (GANs) which have two
dual generators. First one generator is trained to collect the knowledge by
reconstructing the images approximating the original dataset utilized for
pre-training the teachers. Then a dual generator is trained by taking the
output from the former generator as input. Finally we treat the dual part
generator as the target network and regroup it. As demonstrated on several
benchmarks of multi-label classification, the proposed method without any
training data achieves the surprisingly competitive results, even compared with
some full-supervised methods.
Related papers
- Distribution Shift Matters for Knowledge Distillation with Webly
Collected Images [91.66661969598755]
We propose a novel method dubbed Knowledge Distillation between Different Distributions" (KD$3$)
We first dynamically select useful training instances from the webly collected data according to the combined predictions of teacher network and student network.
We also build a new contrastive learning block called MixDistribution to generate perturbed data with a new distribution for instance alignment.
arXiv Detail & Related papers (2023-07-21T10:08:58Z) - Learning Modular Structures That Generalize Out-of-Distribution [1.7034813545878589]
We describe a method for O.O.D. generalization that, through training, encourages models to only preserve features in the network that are well reused across multiple training domains.
Our method combines two complementary neuron-level regularizers with a probabilistic differentiable binary mask over the network, to extract a modular sub-network that achieves better O.O.D. performance than the original network.
arXiv Detail & Related papers (2022-08-07T15:54:19Z) - Transfer Learning via Test-Time Neural Networks Aggregation [11.42582922543676]
It has been demonstrated that deep neural networks outperform traditional machine learning.
Deep networks lack generalisability, that is, they will not perform as good as in a new (testing) set drawn from a different distribution.
arXiv Detail & Related papers (2022-06-27T15:46:05Z) - Self-Supervised Learning for Binary Networks by Joint Classifier
Training [11.612308609123566]
We propose a self-supervised learning method for binary networks.
For better training of the binary network, we propose a feature similarity loss, a dynamic balancing scheme of loss terms, and modified multi-stage training.
Our empirical validations show that BSSL outperforms self-supervised learning baselines for binary networks in various downstream tasks and outperforms supervised pretraining in certain tasks.
arXiv Detail & Related papers (2021-10-17T15:38:39Z) - Dual-Teacher Class-Incremental Learning With Data-Free Generative Replay [49.691610143011566]
We propose two novel knowledge transfer techniques for class-incremental learning (CIL)
First, we propose data-free generative replay (DF-GR) to mitigate catastrophic forgetting in CIL by using synthetic samples from a generative model.
Second, we introduce dual-teacher information distillation (DT-ID) for knowledge distillation from two teachers to one student.
arXiv Detail & Related papers (2021-06-17T22:13:15Z) - Training ELECTRA Augmented with Multi-word Selection [53.77046731238381]
We present a new text encoder pre-training method that improves ELECTRA based on multi-task learning.
Specifically, we train the discriminator to simultaneously detect replaced tokens and select original tokens from candidate sets.
arXiv Detail & Related papers (2021-05-31T23:19:00Z) - Dual Discriminator Adversarial Distillation for Data-free Model
Compression [36.49964835173507]
We propose Dual Discriminator Adversarial Distillation (DDAD) to distill a neural network without any training data or meta-data.
To be specific, we use a generator to create samples through dual discriminator adversarial distillation, which mimics the original training data.
The proposed method obtains an efficient student network which closely approximates its teacher network, despite using no original training data.
arXiv Detail & Related papers (2021-04-12T12:01:45Z) - Training Generative Adversarial Networks in One Stage [58.983325666852856]
We introduce a general training scheme that enables training GANs efficiently in only one stage.
We show that the proposed method is readily applicable to other adversarial-training scenarios, such as data-free knowledge distillation.
arXiv Detail & Related papers (2021-02-28T09:03:39Z) - Multi-modal AsynDGAN: Learn From Distributed Medical Image Data without
Sharing Private Information [55.866673486753115]
We propose an extendable and elastic learning framework to preserve privacy and security.
The proposed framework is named distributed Asynchronized Discriminator Generative Adrial Networks (AsynDGAN)
arXiv Detail & Related papers (2020-12-15T20:41:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.