Related papers: EXPANSE: A Deep Continual / Progressive Learning System for Deep Transfer Learning

EXPANSE: A Deep Continual / Progressive Learning System for Deep Transfer Learning

URL: http://arxiv.org/abs/2205.10356v1
Date: Thu, 19 May 2022 03:54:58 GMT
Title: EXPANSE: A Deep Continual / Progressive Learning System for Deep Transfer Learning
Authors: Mohammadreza Iman, John A. Miller, Khaled Rasheed, Robert M. Branchinst, Hamid R. Arabnia
Abstract summary: Current DTL techniques suffer from either catastrophic forgetting dilemma or overly biased pre-trained models. We propose a new continual/progressive learning approach for deep transfer learning to tackle these limitations. We offer a new way of training deep learning models inspired by the human education system.
Score: 1.1024591739346294
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep transfer learning techniques try to tackle the limitations of deep learning, the dependency on extensive training data and the training costs, by reusing obtained knowledge. However, the current DTL techniques suffer from either catastrophic forgetting dilemma (losing the previously obtained knowledge) or overly biased pre-trained models (harder to adapt to target data) in finetuning pre-trained models or freezing a part of the pre-trained model, respectively. Progressive learning, a sub-category of DTL, reduces the effect of the overly biased model in the case of freezing earlier layers by adding a new layer to the end of a frozen pre-trained model. Even though it has been successful in many cases, it cannot yet handle distant source and target data. We propose a new continual/progressive learning approach for deep transfer learning to tackle these limitations. To avoid both catastrophic forgetting and overly biased-model problems, we expand the pre-trained model by expanding pre-trained layers (adding new nodes to each layer) in the model instead of only adding new layers. Hence the method is named EXPANSE. Our experimental results confirm that we can tackle distant source and target data using this technique. At the same time, the final model is still valid on the source data, achieving a promising deep continual learning approach. Moreover, we offer a new way of training deep learning models inspired by the human education system. We termed this two-step training: learning basics first, then adding complexities and uncertainties. The evaluation implies that the two-step training extracts more meaningful features and a finer basin on the error surface since it can achieve better accuracy in comparison to regular training. EXPANSE (model expansion and two-step training) is a systematic continual learning approach applicable to different problems and DL models.

Related papers

DESIRE: Dynamic Knowledge Consolidation for Rehearsal-Free Continual Learning [23.878495627964146]
Continual learning aims to equip models with the ability to retain previously learned knowledge like a human. Existing methods usually overlook the issue of information leakage caused by the fact that the experiment data have been used in pre-trained models. In this paper, we propose a new LoRA-based rehearsal-free method named DESIRE.
arXiv Detail & Related papers (2024-11-28T13:54:01Z)
Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning. Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation. Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z)
Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning. Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset. We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU) We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z)
An Emulator for Fine-Tuning Large Language Models using Small Language Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales. We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training. Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z)
PILOT: A Pre-Trained Model-Based Continual Learning Toolbox [71.63186089279218]
This paper introduces a pre-trained model-based continual learning toolbox known as PILOT. On the one hand, PILOT implements some state-of-the-art class-incremental learning algorithms based on pre-trained models, such as L2P, DualPrompt, and CODA-Prompt. On the other hand, PILOT fits typical class-incremental learning algorithms within the context of pre-trained models to evaluate their effectiveness.
arXiv Detail & Related papers (2023-09-13T17:55:11Z)
Learning to Modulate pre-trained Models in RL [22.812215561012874]
Fine-tuning a pre-trained model often suffers from catastrophic forgetting. Our study shows that with most fine-tuning approaches, the performance on pre-training tasks deteriorates significantly. We propose a novel method, Learning-to-Modulate (L2M), that avoids the degradation of learned skills by modulating the information flow of the frozen pre-trained model.
arXiv Detail & Related papers (2023-06-26T17:53:05Z)
BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images. Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few. We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z)
TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks. We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework. TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z)
Understanding and Improving Transfer Learning of Deep Models via Neural Collapse [37.483109067209504]
This work investigates the relationship between neural collapse (NC) and transfer learning for classification problems. We find strong correlation between feature collapse and downstream performance. Our proposed fine-tuning methods deliver good performances while reducing fine-tuning parameters by at least 90%.
arXiv Detail & Related papers (2022-12-23T08:48:34Z)
A Review of Deep Transfer Learning and Recent Advancements [1.3535770763481905]
Deep transfer learning (DTL) methods are the answer to tackle such limitations. DTLs handle limited target data concerns as well as drastically reduce the training costs.
arXiv Detail & Related papers (2022-01-19T04:19:36Z)
An Efficient Method of Training Small Models for Regression Problems with Knowledge Distillation [1.433758865948252]
We propose a new formalism of knowledge distillation for regression problems. First, we propose a new loss function, teacher outlier loss rejection, which rejects outliers in training samples using teacher model predictions. By considering the multi-task network, training of the feature extraction of student models becomes more effective.
arXiv Detail & Related papers (2020-02-28T08:46:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.