Neural Networks Are More Productive Teachers Than Human Raters: Active
Mixup for Data-Efficient Knowledge Distillation from a Blackbox Model
- URL: http://arxiv.org/abs/2003.13960v1
- Date: Tue, 31 Mar 2020 05:44:55 GMT
- Title: Neural Networks Are More Productive Teachers Than Human Raters: Active
Mixup for Data-Efficient Knowledge Distillation from a Blackbox Model
- Authors: Dongdong Wang, Yandong Li, Liqiang Wang, Boqing Gong
- Abstract summary: We study how to train a student deep neural network for visual recognition by distilling knowledge from a blackbox teacher model in a data-efficient manner.
We propose an approach that blends mixup and active learning.
- Score: 57.41841346459995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study how to train a student deep neural network for visual recognition by
distilling knowledge from a blackbox teacher model in a data-efficient manner.
Progress on this problem can significantly reduce the dependence on large-scale
datasets for learning high-performing visual recognition models. There are two
major challenges. One is that the number of queries into the teacher model
should be minimized to save computational and/or financial costs. The other is
that the number of images used for the knowledge distillation should be small;
otherwise, it violates our expectation of reducing the dependence on
large-scale datasets. To tackle these challenges, we propose an approach that
blends mixup and active learning. The former effectively augments the few
unlabeled images by a big pool of synthetic images sampled from the convex hull
of the original images, and the latter actively chooses from the pool hard
examples for the student neural network and query their labels from the teacher
model. We validate our approach with extensive experiments.
Related papers
- Dataset Quantization [72.61936019738076]
We present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets.
DQ is the first method that can successfully distill large-scale datasets such as ImageNet-1k with a state-of-the-art compression ratio.
arXiv Detail & Related papers (2023-08-21T07:24:29Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - Multi-teacher knowledge distillation as an effective method for
compressing ensembles of neural networks [0.0]
Large-scale deep models have achieved great success, but the enormous computational complexity and gigantic storage requirements make them difficult to implement in real-time applications.
We present a modified knowledge distillation framework which allows compressing the entire ensemble model into a weight space of a single model.
We show that knowledge distillation can aggregate knowledge from multiple teachers in only one student model and, with the same computational complexity, obtain a better-performing model compared to a model trained in the standard manner.
arXiv Detail & Related papers (2023-02-14T17:40:36Z) - EfficientTrain: Exploring Generalized Curriculum Learning for Training
Visual Backbones [80.662250618795]
This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers)
As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models by >1.5x on ImageNet-1K/22K without sacrificing accuracy.
arXiv Detail & Related papers (2022-11-17T17:38:55Z) - Distill on the Go: Online knowledge distillation in self-supervised
learning [1.1470070927586016]
Recent works have shown that wider and deeper models benefit more from self-supervised learning than smaller models.
We propose Distill-on-the-Go (DoGo), a self-supervised learning paradigm using single-stage online knowledge distillation.
Our results show significant performance gain in the presence of noisy and limited labels.
arXiv Detail & Related papers (2021-04-20T09:59:23Z) - Efficacy of Bayesian Neural Networks in Active Learning [11.609770399591516]
We show that Bayesian neural networks are more efficient than ensemble based techniques in capturing uncertainty.
Our findings also reveal some key drawbacks of the ensemble techniques, which was recently shown to be more effective than Monte Carlo dropouts.
arXiv Detail & Related papers (2021-04-02T06:02:11Z) - Counterfactual Generative Networks [59.080843365828756]
We propose to decompose the image generation process into independent causal mechanisms that we train without direct supervision.
By exploiting appropriate inductive biases, these mechanisms disentangle object shape, object texture, and background.
We show that the counterfactual images can improve out-of-distribution with a marginal drop in performance on the original classification task.
arXiv Detail & Related papers (2021-01-15T10:23:12Z) - Application of Facial Recognition using Convolutional Neural Networks
for Entry Access Control [0.0]
The paper focuses on solving the supervised classification problem of taking images of people as input and classifying the person in the image as one of the authors or not.
Two approaches are proposed: (1) building and training a neural network called WoodNet from scratch and (2) leveraging transfer learning by utilizing a network pre-trained on the ImageNet database.
The results are two models classifying the individuals in the dataset with high accuracy, achieving over 99% accuracy on held-out test data.
arXiv Detail & Related papers (2020-11-23T07:55:24Z) - Data-Efficient Ranking Distillation for Image Retrieval [15.88955427198763]
Recent approaches tackle this issue using knowledge distillation to transfer knowledge from a deeper and heavier architecture to a much smaller network.
In this paper we address knowledge distillation for metric learning problems.
Unlike previous approaches, our proposed method jointly addresses the following constraints i) limited queries to teacher model, ii) black box teacher model with access to the final output representation, andiii) small fraction of original training data without any ground-truth labels.
arXiv Detail & Related papers (2020-07-10T10:59:16Z) - Learning to Reweight with Deep Interactions [104.68509759134878]
We propose an improved data reweighting algorithm, in which the student model provides its internal states to the teacher model.
Experiments on image classification with clean/noisy labels and neural machine translation empirically demonstrate that our algorithm makes significant improvement over previous methods.
arXiv Detail & Related papers (2020-07-09T09:06:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.