Continual Learning with Transformers for Image Classification
- URL: http://arxiv.org/abs/2206.14085v1
- Date: Tue, 28 Jun 2022 15:30:10 GMT
- Title: Continual Learning with Transformers for Image Classification
- Authors: Beyza Ermis, Giovanni Zappella, Martin Wistuba, Aditya Rawal, Cedric
Archambeau
- Abstract summary: In computer vision, neural network models struggle to continually learn new concepts without forgetting what has been learnt in the past.
We develop a solution called Adaptive Distillation of Adapters (ADA), which is developed to perform continual learning.
We empirically demonstrate on different classification tasks that this method maintains a good predictive performance without retraining the model.
- Score: 12.028617058465333
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In many real-world scenarios, data to train machine learning models become
available over time. However, neural network models struggle to continually
learn new concepts without forgetting what has been learnt in the past. This
phenomenon is known as catastrophic forgetting and it is often difficult to
prevent due to practical constraints, such as the amount of data that can be
stored or the limited computation sources that can be used. Moreover, training
large neural networks, such as Transformers, from scratch is very costly and
requires a vast amount of training data, which might not be available in the
application domain of interest. A recent trend indicates that dynamic
architectures based on an expansion of the parameters can reduce catastrophic
forgetting efficiently in continual learning, but this needs complex tuning to
balance the growing number of parameters and barely share any information
across tasks. As a result, they struggle to scale to a large number of tasks
without significant overhead. In this paper, we validate in the computer vision
domain a recent solution called Adaptive Distillation of Adapters (ADA), which
is developed to perform continual learning using pre-trained Transformers and
Adapters on text classification tasks. We empirically demonstrate on different
classification tasks that this method maintains a good predictive performance
without retraining the model or increasing the number of model parameters over
the time. Besides it is significantly faster at inference time compared to the
state-of-the-art methods.
Related papers
- Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network)
After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference.
We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z) - Premonition: Using Generative Models to Preempt Future Data Changes in
Continual Learning [63.850451635362425]
Continual learning requires a model to adapt to ongoing changes in the data distribution.
We show that the combination of a large language model and an image generation model can similarly provide useful premonitions.
We find that the backbone of our pre-trained networks can learn representations useful for the downstream continual learning problem.
arXiv Detail & Related papers (2024-03-12T06:29:54Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Task Arithmetic with LoRA for Continual Learning [0.0]
We propose a novel method to continually train vision models using low-rank adaptation and task arithmetic.
When aided with a small memory of 10 samples per class, our method achieves performance close to full-set finetuning.
arXiv Detail & Related papers (2023-11-04T15:12:24Z) - PIVOT: Prompting for Video Continual Learning [50.80141083993668]
We introduce PIVOT, a novel method that leverages extensive knowledge in pre-trained models from the image domain.
Our experiments show that PIVOT improves state-of-the-art methods by a significant 27% on the 20-task ActivityNet setup.
arXiv Detail & Related papers (2022-12-09T13:22:27Z) - Cooperative data-driven modeling [44.99833362998488]
Data-driven modeling in mechanics is evolving rapidly based on recent machine learning advances.
New data and models created by different groups become available, opening possibilities for cooperative modeling.
Artificial neural networks suffer from catastrophic forgetting, i.e. they forget how to perform an old task when trained on a new one.
This hinders cooperation because adapting an existing model for a new task affects the performance on a previous task trained by someone else.
arXiv Detail & Related papers (2022-11-23T14:27:25Z) - Improving generalization with synthetic training data for deep learning
based quality inspection [0.0]
supervised deep learning requires a large amount of annotated images for training.
In practice, collecting and annotating such data is costly and laborious.
We show the use of randomly generated synthetic training images can help tackle domain instability.
arXiv Detail & Related papers (2022-02-25T16:51:01Z) - Deep invariant networks with differentiable augmentation layers [87.22033101185201]
Methods for learning data augmentation policies require held-out data and are based on bilevel optimization problems.
We show that our approach is easier and faster to train than modern automatic data augmentation techniques.
arXiv Detail & Related papers (2022-02-04T14:12:31Z) - Lambda Learner: Fast Incremental Learning on Data Streams [5.543723668681475]
We propose a new framework for training models by incremental updates in response to mini-batches from data streams.
We show that the resulting model of our framework closely estimates a periodically updated model trained on offline data and outperforms it when model updates are time-sensitive.
We present a large-scale deployment on the sponsored content platform for a large social network.
arXiv Detail & Related papers (2020-10-11T04:00:34Z) - Learning to Transfer Dynamic Models of Underactuated Soft Robotic Hands [15.481728234509227]
Transfer learning is a popular approach to bypassing data limitations in one domain by leveraging data from another domain.
We show that in some situations this can lead to significantly worse performance than simply using the transferred model without adaptation.
We derive an upper bound on the Lyapunov exponent of a trained transition model, and demonstrate two approaches that make use of this insight.
arXiv Detail & Related papers (2020-05-21T01:46:59Z) - Parameter-Efficient Transfer from Sequential Behaviors for User Modeling
and Recommendation [111.44445634272235]
In this paper, we develop a parameter efficient transfer learning architecture, termed as PeterRec.
PeterRec allows the pre-trained parameters to remain unaltered during fine-tuning by injecting a series of re-learned neural networks.
We perform extensive experimental ablation to show the effectiveness of the learned user representation in five downstream tasks.
arXiv Detail & Related papers (2020-01-13T14:09:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.