Learning to Reweight with Deep Interactions
- URL: http://arxiv.org/abs/2007.04649v2
- Date: Tue, 12 Jan 2021 08:07:46 GMT
- Title: Learning to Reweight with Deep Interactions
- Authors: Yang Fan, Yingce Xia, Lijun Wu, Shufang Xie, Weiqing Liu, Jiang Bian,
Tao Qin, Xiang-Yang Li
- Abstract summary: We propose an improved data reweighting algorithm, in which the student model provides its internal states to the teacher model.
Experiments on image classification with clean/noisy labels and neural machine translation empirically demonstrate that our algorithm makes significant improvement over previous methods.
- Score: 104.68509759134878
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the concept of teaching has been introduced into machine learning,
in which a teacher model is used to guide the training of a student model
(which will be used in real tasks) through data selection, loss function
design, etc. Learning to reweight, which is a specific kind of teaching that
reweights training data using a teacher model, receives much attention due to
its simplicity and effectiveness. In existing learning to reweight works, the
teacher model only utilizes shallow/surface information such as training
iteration number and loss/accuracy of the student model from
training/validation sets, but ignores the internal states of the student model,
which limits the potential of learning to reweight. In this work, we propose an
improved data reweighting algorithm, in which the student model provides its
internal states to the teacher model, and the teacher model returns adaptive
weights of training samples to enhance the training of the student model. The
teacher model is jointly trained with the student model using meta gradients
propagated from a validation set. Experiments on image classification with
clean/noisy labels and neural machine translation empirically demonstrate that
our algorithm makes significant improvement over previous methods.
Related papers
- Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models [62.5501109475725]
Knowledge distillation (KD) is a technique that compresses large teacher models by training smaller student models to mimic them.
This paper introduces Online Knowledge Distillation (OKD), where the teacher network integrates small online modules to concurrently train with the student model.
OKD achieves or exceeds the performance of leading methods in various model architectures and sizes, reducing training time by up to fourfold.
arXiv Detail & Related papers (2024-09-19T07:05:26Z) - UnLearning from Experience to Avoid Spurious Correlations [3.283369870504872]
We propose a new approach that addresses the issue of spurious correlations: UnLearning from Experience (ULE)
Our method is based on using two classification models trained in parallel: student and teacher models.
We show that our method is effective on the Waterbirds, CelebA, Spawrious and UrbanCars datasets.
arXiv Detail & Related papers (2024-09-04T15:06:44Z) - Distilled Datamodel with Reverse Gradient Matching [74.75248610868685]
We introduce an efficient framework for assessing data impact, comprising offline training and online evaluation stages.
Our proposed method achieves comparable model behavior evaluation while significantly speeding up the process compared to the direct retraining method.
arXiv Detail & Related papers (2024-04-22T09:16:14Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - L2T-DLN: Learning to Teach with Dynamic Loss Network [4.243592852049963]
In existing works, the teacher iteration model 1) merely determines the loss function based on the present states of the student model.
In this paper, we first formulate the loss adjustment as a temporal task by designing a teacher model with memory units.
Then, with a dynamic loss network, we can additionally use the states of the loss to assist the teacher learning in enhancing the interactions between the teacher and the student model.
arXiv Detail & Related papers (2023-10-30T07:21:40Z) - Revealing Secrets From Pre-trained Models [2.0249686991196123]
Transfer-learning has been widely adopted in many emerging deep learning algorithms.
We show that pre-trained models and fine-tuned models have significantly high similarities in weight values.
We propose a new model extraction attack that reveals the model architecture and the pre-trained model used by the black-box victim model.
arXiv Detail & Related papers (2022-07-19T20:19:03Z) - Forward Compatible Training for Representation Learning [53.300192863727226]
backward compatible training (BCT) modifies training of the new model to make its representations compatible with those of the old model.
BCT can significantly hinder the performance of the new model.
In this work, we propose a new learning paradigm for representation learning: forward compatible training (FCT)
arXiv Detail & Related papers (2021-12-06T06:18:54Z) - Reinforced Multi-Teacher Selection for Knowledge Distillation [54.72886763796232]
knowledge distillation is a popular method for model compression.
Current methods assign a fixed weight to a teacher model in the whole distillation.
Most of the existing methods allocate an equal weight to every teacher model.
In this paper, we observe that, due to the complexity of training examples and the differences in student model capability, learning differentially from teacher models can lead to better performance of student models distilled.
arXiv Detail & Related papers (2020-12-11T08:56:39Z) - MED-TEX: Transferring and Explaining Knowledge with Less Data from
Pretrained Medical Imaging Models [38.12462659279648]
A small student model is learned with less data by distilling knowledge from a cumbersome pretrained teacher model.
An explainer module is introduced to highlight the regions of an input that are important for the predictions of the teacher model.
Our framework outperforms on the knowledge distillation and model interpretation tasks compared to state-of-the-art methods on a fundus dataset.
arXiv Detail & Related papers (2020-08-06T11:50:32Z) - Efficient Learning of Model Weights via Changing Features During
Training [0.0]
We propose a machine learning model, which dynamically changes the features during training.
Our main motivation is to update the model in a small content during the training process with replacing less descriptive features to new ones from a large pool.
arXiv Detail & Related papers (2020-02-21T12:38:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.