CDFKD-MFS: Collaborative Data-free Knowledge Distillation via
Multi-level Feature Sharing
- URL: http://arxiv.org/abs/2205.11845v1
- Date: Tue, 24 May 2022 07:11:03 GMT
- Title: CDFKD-MFS: Collaborative Data-free Knowledge Distillation via
Multi-level Feature Sharing
- Authors: Zhiwei Hao, Yong Luo, Zhi Wang, Han Hu, Jianping An
- Abstract summary: We propose a framework termed collaborative data-free knowledge distillation via multi-level feature sharing.
The accuracy of the proposed framework is 1.18% higher on the CIFAR-100 dataset, 1.67% higher on the Caltech dataset, and 2.99% higher on the mini-ImageNet dataset.
- Score: 24.794665141853905
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the compression and deployment of powerful deep neural networks
(DNNs) on resource-limited edge devices to provide intelligent services have
become attractive tasks. Although knowledge distillation (KD) is a feasible
solution for compression, its requirement on the original dataset raises
privacy concerns. In addition, it is common to integrate multiple pretrained
models to achieve satisfactory performance. How to compress multiple models
into a tiny model is challenging, especially when the original data are
unavailable. To tackle this challenge, we propose a framework termed
collaborative data-free knowledge distillation via multi-level feature sharing
(CDFKD-MFS), which consists of a multi-header student module, an asymmetric
adversarial data-free KD module, and an attention-based aggregation module. In
this framework, the student model equipped with a multi-level feature-sharing
structure learns from multiple teacher models and is trained together with a
generator in an asymmetric adversarial manner. When some real samples are
available, the attention module adaptively aggregates predictions of the
student headers, which can further improve performance. We conduct extensive
experiments on three popular computer visual datasets. In particular, compared
with the most competitive alternative, the accuracy of the proposed framework
is 1.18\% higher on the CIFAR-100 dataset, 1.67\% higher on the Caltech-101
dataset, and 2.99\% higher on the mini-ImageNet dataset.
Related papers
- Self-Regulated Data-Free Knowledge Amalgamation for Text Classification [9.169836450935724]
We develop a lightweight student network that can learn from multiple teacher models without accessing their original training data.
To accomplish this, we propose STRATANET, a modeling framework that produces text data tailored to each teacher.
We evaluate our method on three benchmark text classification datasets with varying labels or domains.
arXiv Detail & Related papers (2024-06-16T21:13:30Z) - Dataset Quantization [72.61936019738076]
We present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets.
DQ is the first method that can successfully distill large-scale datasets such as ImageNet-1k with a state-of-the-art compression ratio.
arXiv Detail & Related papers (2023-08-21T07:24:29Z) - Aggregating Intrinsic Information to Enhance BCI Performance through
Federated Learning [29.65566062475597]
Insufficient data is a long-standing challenge for Brain-Computer Interface (BCI) to build a high-performance deep learning model.
We propose a hierarchical personalized Federated Learning EEG decoding framework to surmount this challenge.
arXiv Detail & Related papers (2023-08-14T08:59:44Z) - An Efficient General-Purpose Modular Vision Model via Multi-Task
Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently.
Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z) - Scalable Collaborative Learning via Representation Sharing [53.047460465980144]
Federated learning (FL) and Split Learning (SL) are two frameworks that enable collaborative learning while keeping the data private (on device)
In FL, each data holder trains a model locally and releases it to a central server for aggregation.
In SL, the clients must release individual cut-layer activations (smashed data) to the server and wait for its response (during both inference and back propagation).
In this work, we present a novel approach for privacy-preserving machine learning, where the clients collaborate via online knowledge distillation using a contrastive loss.
arXiv Detail & Related papers (2022-11-20T10:49:22Z) - Neural Attentive Circuits [93.95502541529115]
We introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs)
NACs learn the parameterization and a sparse connectivity of neural modules without using domain knowledge.
NACs achieve an 8x speedup at inference time while losing less than 3% performance.
arXiv Detail & Related papers (2022-10-14T18:00:07Z) - Model Composition: Can Multiple Neural Networks Be Combined into a
Single Network Using Only Unlabeled Data? [6.0945220518329855]
This paper investigates the idea of combining multiple trained neural networks using unlabeled data.
To this end, the proposed method makes use of generation, filtering, and aggregation of reliable pseudo-labels collected from unlabeled data.
Our method supports using an arbitrary number of input models with arbitrary architectures and categories.
arXiv Detail & Related papers (2021-10-20T04:17:25Z) - Online Ensemble Model Compression using Knowledge Distillation [51.59021417947258]
This paper presents a knowledge distillation based model compression framework consisting of a student ensemble.
It enables distillation of simultaneously learnt ensemble knowledge onto each of the compressed student models.
We provide comprehensive experiments using state-of-the-art classification models to validate our framework's effectiveness.
arXiv Detail & Related papers (2020-11-15T04:46:29Z) - Ensemble Distillation for Robust Model Fusion in Federated Learning [72.61259487233214]
Federated Learning (FL) is a machine learning setting where many devices collaboratively train a machine learning model.
In most of the current training schemes the central model is refined by averaging the parameters of the server model and the updated parameters from the client side.
We propose ensemble distillation for model fusion, i.e. training the central classifier through unlabeled data on the outputs of the models from the clients.
arXiv Detail & Related papers (2020-06-12T14:49:47Z) - Evaluation Framework For Large-scale Federated Learning [10.127616622630514]
Federated learning is proposed as a machine learning setting to enable distributed edge devices, such as mobile phones, to collaboratively learn a shared prediction model.
In this paper, we introduce a framework designed for large-scale federated learning which consists of approaches to generating dataset and modular evaluation framework.
arXiv Detail & Related papers (2020-03-03T15:12:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.