Lifelong Learning with Searchable Extension Units
- URL: http://arxiv.org/abs/2003.08559v1
- Date: Thu, 19 Mar 2020 03:45:51 GMT
- Title: Lifelong Learning with Searchable Extension Units
- Authors: Wenjin Wang, Yunqing Hu, Yin Zhang
- Abstract summary: We propose a new lifelong learning framework named Searchable Extension Units (SEU)
It breaks down the need for a predefined original model and searches for specific extension units for different tasks.
Our approach can obtain a much more compact model without catastrophic forgetting.
- Score: 21.17631355880764
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Lifelong learning remains an open problem. One of its main difficulties is
catastrophic forgetting. Many dynamic expansion approaches have been proposed
to address this problem, but they all use homogeneous models of predefined
structure for all tasks. The common original model and expansion structures
ignore the requirement of different model structures on different tasks, which
leads to a less compact model for multiple tasks and causes the model size to
increase rapidly as the number of tasks increases. Moreover, they can not
perform best on all tasks. To solve those problems, in this paper, we propose a
new lifelong learning framework named Searchable Extension Units (SEU) by
introducing Neural Architecture Search into lifelong learning, which breaks
down the need for a predefined original model and searches for specific
extension units for different tasks, without compromising the performance of
the model on different tasks. Our approach can obtain a much more compact model
without catastrophic forgetting. The experimental results on the PMNIST, the
split CIFAR10 dataset, the split CIFAR100 dataset, and the Mixture dataset
empirically prove that our method can achieve higher accuracy with much smaller
model, whose size is about 25-33 percentage of that of the state-of-the-art
methods.
Related papers
- Cross-Domain Content Generation with Domain-Specific Small Language Models [3.2772349789781616]
This study explores methods to enable a small language model to produce coherent and relevant outputs for two different domains.
We find that utilizing custom tokenizers tailored to each dataset significantly enhances generation quality.
Our findings demonstrate that knowledge expansion with frozen layers is an effective method for small language models to generate domain-specific content.
arXiv Detail & Related papers (2024-09-19T21:45:13Z) - Task Groupings Regularization: Data-Free Meta-Learning with Heterogeneous Pre-trained Models [83.02797560769285]
Data-Free Meta-Learning (DFML) aims to derive knowledge from a collection of pre-trained models without accessing their original data.
Current methods often overlook the heterogeneity among pre-trained models, which leads to performance degradation due to task conflicts.
We propose Task Groupings Regularization, a novel approach that benefits from model heterogeneity by grouping and aligning conflicting tasks.
arXiv Detail & Related papers (2024-05-26T13:11:55Z) - Merging Multi-Task Models via Weight-Ensembling Mixture of Experts [64.94129594112557]
Merging Transformer-based models trained on different tasks into a single unified model can execute all the tasks concurrently.
Previous methods, exemplified by task arithmetic, have been proven to be both effective and scalable.
We propose to merge most of the parameters while upscaling the Transformer layers to a weight-ensembling mixture of experts (MoE) module.
arXiv Detail & Related papers (2024-02-01T08:58:57Z) - Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks [12.146530928616386]
A common approach for targeted problems involves fine-tuning pre-trained foundation models for specific target tasks.
This work focuses on the problem of merging multiple fine-tunings of the same foundation model derived from a spectrum of auxiliary tasks.
We introduce a new simple method, Model Breadcrumbs, which consists of a sparsely defined weight set that guides model adaptation within the weight space of a pre-trained model.
arXiv Detail & Related papers (2023-12-11T19:10:55Z) - Concrete Subspace Learning based Interference Elimination for Multi-task
Model Fusion [86.6191592951269]
Merging models fine-tuned from common extensively pretrained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multitask model that performs well across diverse tasks.
We propose the CONtinuous relaxation dis (Concrete) subspace learning method to identify a common lowdimensional subspace and utilize its shared information track interference problem without sacrificing performance.
arXiv Detail & Related papers (2023-12-11T07:24:54Z) - Efficient Expansion and Gradient Based Task Inference for Replay Free
Incremental Learning [5.760774528950479]
Recent expansion based models show promising results for task incremental learning (TIL)
For class incremental learning (CIL), prediction of task id is a crucial challenge.
We propose a robust task prediction method that leverages entropy weighted data augmentations and the models gradient using pseudo labels.
arXiv Detail & Related papers (2023-12-02T17:28:52Z) - An Efficient General-Purpose Modular Vision Model via Multi-Task
Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently.
Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z) - ZipIt! Merging Models from Different Tasks without Training [20.2479633507354]
"ZipIt!" is a general method for merging two arbitrary models of the same architecture.
We find that these two changes combined account for 20-60% improvement over prior work.
arXiv Detail & Related papers (2023-05-04T17:59:58Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners [74.92558307689265]
We propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad')
We optimize this matching process during the training of a single model.
Experiments on the Taskonomy dataset with 13 vision tasks and the PASCAL-Context dataset with 5 vision tasks show the superiority of our approach.
arXiv Detail & Related papers (2022-12-15T18:59:52Z) - Shared and Private VAEs with Generative Replay for Continual Learning [1.90365714903665]
Continual learning tries to learn new tasks without forgetting previously learned ones.
Most of the existing artificial neural network(ANN) models fail, while humans do the same by remembering previous works throughout their life.
We show our hybrid model effectively avoids forgetting and achieves state-of-the-art results on visual continual learning benchmarks such as MNIST, Permuted MNIST(QMNIST), CIFAR100, and miniImageNet datasets.
arXiv Detail & Related papers (2021-05-17T06:18:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.