EXPANSE: A Deep Continual / Progressive Learning System for Deep
Transfer Learning
- URL: http://arxiv.org/abs/2205.10356v1
- Date: Thu, 19 May 2022 03:54:58 GMT
- Title: EXPANSE: A Deep Continual / Progressive Learning System for Deep
Transfer Learning
- Authors: Mohammadreza Iman, John A. Miller, Khaled Rasheed, Robert M.
Branchinst, Hamid R. Arabnia
- Abstract summary: Current DTL techniques suffer from either catastrophic forgetting dilemma or overly biased pre-trained models.
We propose a new continual/progressive learning approach for deep transfer learning to tackle these limitations.
We offer a new way of training deep learning models inspired by the human education system.
- Score: 1.1024591739346294
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep transfer learning techniques try to tackle the limitations of deep
learning, the dependency on extensive training data and the training costs, by
reusing obtained knowledge. However, the current DTL techniques suffer from
either catastrophic forgetting dilemma (losing the previously obtained
knowledge) or overly biased pre-trained models (harder to adapt to target data)
in finetuning pre-trained models or freezing a part of the pre-trained model,
respectively. Progressive learning, a sub-category of DTL, reduces the effect
of the overly biased model in the case of freezing earlier layers by adding a
new layer to the end of a frozen pre-trained model. Even though it has been
successful in many cases, it cannot yet handle distant source and target data.
We propose a new continual/progressive learning approach for deep transfer
learning to tackle these limitations. To avoid both catastrophic forgetting and
overly biased-model problems, we expand the pre-trained model by expanding
pre-trained layers (adding new nodes to each layer) in the model instead of
only adding new layers. Hence the method is named EXPANSE. Our experimental
results confirm that we can tackle distant source and target data using this
technique. At the same time, the final model is still valid on the source data,
achieving a promising deep continual learning approach. Moreover, we offer a
new way of training deep learning models inspired by the human education
system. We termed this two-step training: learning basics first, then adding
complexities and uncertainties. The evaluation implies that the two-step
training extracts more meaningful features and a finer basin on the error
surface since it can achieve better accuracy in comparison to regular training.
EXPANSE (model expansion and two-step training) is a systematic continual
learning approach applicable to different problems and DL models.
Related papers
- Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - An Emulator for Fine-Tuning Large Language Models using Small Language
Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales.
We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training.
Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z) - PILOT: A Pre-Trained Model-Based Continual Learning Toolbox [71.63186089279218]
This paper introduces a pre-trained model-based continual learning toolbox known as PILOT.
On the one hand, PILOT implements some state-of-the-art class-incremental learning algorithms based on pre-trained models, such as L2P, DualPrompt, and CODA-Prompt.
On the other hand, PILOT fits typical class-incremental learning algorithms within the context of pre-trained models to evaluate their effectiveness.
arXiv Detail & Related papers (2023-09-13T17:55:11Z) - Learning to Modulate pre-trained Models in RL [22.812215561012874]
Fine-tuning a pre-trained model often suffers from catastrophic forgetting.
Our study shows that with most fine-tuning approaches, the performance on pre-training tasks deteriorates significantly.
We propose a novel method, Learning-to-Modulate (L2M), that avoids the degradation of learned skills by modulating the information flow of the frozen pre-trained model.
arXiv Detail & Related papers (2023-06-26T17:53:05Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Understanding and Improving Transfer Learning of Deep Models via Neural Collapse [37.483109067209504]
This work investigates the relationship between neural collapse (NC) and transfer learning for classification problems.
We find strong correlation between feature collapse and downstream performance.
Our proposed fine-tuning methods deliver good performances while reducing fine-tuning parameters by at least 90%.
arXiv Detail & Related papers (2022-12-23T08:48:34Z) - A Review of Deep Transfer Learning and Recent Advancements [1.3535770763481905]
Deep transfer learning (DTL) methods are the answer to tackle such limitations.
DTLs handle limited target data concerns as well as drastically reduce the training costs.
arXiv Detail & Related papers (2022-01-19T04:19:36Z) - An Efficient Method of Training Small Models for Regression Problems
with Knowledge Distillation [1.433758865948252]
We propose a new formalism of knowledge distillation for regression problems.
First, we propose a new loss function, teacher outlier loss rejection, which rejects outliers in training samples using teacher model predictions.
By considering the multi-task network, training of the feature extraction of student models becomes more effective.
arXiv Detail & Related papers (2020-02-28T08:46:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.