An Efficient Method of Training Small Models for Regression Problems
with Knowledge Distillation
- URL: http://arxiv.org/abs/2002.12597v1
- Date: Fri, 28 Feb 2020 08:46:12 GMT
- Title: An Efficient Method of Training Small Models for Regression Problems
with Knowledge Distillation
- Authors: Makoto Takamoto, Yusuke Morishita, and Hitoshi Imaoka
- Abstract summary: We propose a new formalism of knowledge distillation for regression problems.
First, we propose a new loss function, teacher outlier loss rejection, which rejects outliers in training samples using teacher model predictions.
By considering the multi-task network, training of the feature extraction of student models becomes more effective.
- Score: 1.433758865948252
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compressing deep neural network (DNN) models becomes a very important and
necessary technique for real-world applications, such as deploying those models
on mobile devices. Knowledge distillation is one of the most popular methods
for model compression, and many studies have been made on developing this
technique. However, those studies mainly focused on classification problems,
and very few attempts have been made on regression problems, although there are
many application of DNNs on regression problems. In this paper, we propose a
new formalism of knowledge distillation for regression problems. First, we
propose a new loss function, teacher outlier rejection loss, which rejects
outliers in training samples using teacher model predictions. Second, we
consider a multi-task network with two outputs: one estimates training labels
which is in general contaminated by noisy labels; And the other estimates
teacher model's output which is expected to modify the noise labels following
the memorization effects. By considering the multi-task network, training of
the feature extraction of student models becomes more effective, and it allows
us to obtain a better student model than one trained from scratch. We performed
comprehensive evaluation with one simple toy model: sinusoidal function, and
two open datasets: MPIIGaze, and Multi-PIE. Our results show consistent
improvement in accuracy regardless of the annotation error level in the
datasets.
Related papers
- Distribution-Level Feature Distancing for Machine Unlearning: Towards a Better Trade-off Between Model Utility and Forgetting [4.220336689294245]
Recent studies have presented various machine unlearning algorithms to make a trained model unlearn the data to be forgotten.
We propose Distribution-Level Feature Distancing (DLFD), a novel method that efficiently forgets instances while preventing correlation collapse.
Our method synthesizes data samples so that the generated data distribution is far from the distribution of samples being forgotten in the feature space.
arXiv Detail & Related papers (2024-09-23T06:51:10Z) - Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box.
This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z) - Streaming Active Learning for Regression Problems Using Regression via
Classification [12.572218568705376]
We propose to use the regression-via-classification framework for streaming active learning for regression.
Regression-via-classification transforms regression problems into classification problems so that streaming active learning methods can be applied directly to regression problems.
arXiv Detail & Related papers (2023-09-02T20:24:24Z) - Gradient Surgery for One-shot Unlearning on Generative Model [0.989293617504294]
We introduce a simple yet effective approach to remove a data influence on the deep generative model.
Inspired by works in multi-task learning, we propose to manipulate gradients to regularize the interplay of influence among samples.
arXiv Detail & Related papers (2023-07-10T13:29:23Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - EXPANSE: A Deep Continual / Progressive Learning System for Deep
Transfer Learning [1.1024591739346294]
Current DTL techniques suffer from either catastrophic forgetting dilemma or overly biased pre-trained models.
We propose a new continual/progressive learning approach for deep transfer learning to tackle these limitations.
We offer a new way of training deep learning models inspired by the human education system.
arXiv Detail & Related papers (2022-05-19T03:54:58Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models.
Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Learning to Reweight with Deep Interactions [104.68509759134878]
We propose an improved data reweighting algorithm, in which the student model provides its internal states to the teacher model.
Experiments on image classification with clean/noisy labels and neural machine translation empirically demonstrate that our algorithm makes significant improvement over previous methods.
arXiv Detail & Related papers (2020-07-09T09:06:31Z) - Neural Network Retraining for Model Serving [32.857847595096025]
We propose incremental (re)training of a neural network model to cope with a continuous flow of new data in inference.
We address two challenges of life-long retraining: catastrophic forgetting and efficient retraining.
arXiv Detail & Related papers (2020-04-29T13:52:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.