In Defense of the Learning Without Forgetting for Task Incremental
Learning
- URL: http://arxiv.org/abs/2107.12304v1
- Date: Mon, 26 Jul 2021 16:23:13 GMT
- Title: In Defense of the Learning Without Forgetting for Task Incremental
Learning
- Authors: Guy Oren and Lior Wolf
- Abstract summary: Catastrophic forgetting is one of the major challenges on the road for continual learning systems.
This paper shows that using the right architecture along with a standard set of augmentations, the results obtained by LwF surpass the latest algorithms for task incremental scenario.
- Score: 91.3755431537592
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Catastrophic forgetting is one of the major challenges on the road for
continual learning systems, which are presented with an on-line stream of
tasks. The field has attracted considerable interest and a diverse set of
methods have been presented for overcoming this challenge. Learning without
Forgetting (LwF) is one of the earliest and most frequently cited methods. It
has the advantages of not requiring the storage of samples from the previous
tasks, of implementation simplicity, and of being well-grounded by relying on
knowledge distillation. However, the prevailing view is that while it shows a
relatively small amount of forgetting when only two tasks are introduced, it
fails to scale to long sequences of tasks. This paper challenges this view, by
showing that using the right architecture along with a standard set of
augmentations, the results obtained by LwF surpass the latest algorithms for
task incremental scenario. This improved performance is demonstrated by an
extensive set of experiments over CIFAR-100 and Tiny-ImageNet, where it is also
shown that other methods cannot benefit as much from similar improvements.
Related papers
- Pre-trained Vision and Language Transformers Are Few-Shot Incremental Learners [19.579098962615795]
Few-Shot Class Incremental Learning (FSCIL) is a task that requires a model to learn new classes incrementally without forgetting when only a few samples for each class are given.
FSCIL encounters two significant challenges: catastrophic forgetting and overfitting.
We argue that large models such as vision and language transformers pre-trained on large datasets can be excellent few-shot incremental learners.
arXiv Detail & Related papers (2024-04-02T17:23:22Z) - MCF-VC: Mitigate Catastrophic Forgetting in Class-Incremental Learning
for Multimodal Video Captioning [10.95493493610559]
We propose a method to Mitigate Catastrophic Forgetting in class-incremental learning for multimodal Video Captioning (MCF-VC)
In order to better constrain the knowledge characteristics of old and new tasks at the specific feature level, we have created the Two-stage Knowledge Distillation (TsKD)
Our experiments on the public dataset MSR-VTT show that the proposed method significantly resists the forgetting of previous tasks without replaying old samples, and performs well on the new task.
arXiv Detail & Related papers (2024-02-27T16:54:08Z) - Fine-Grained Knowledge Selection and Restoration for Non-Exemplar Class
Incremental Learning [64.14254712331116]
Non-exemplar class incremental learning aims to learn both the new and old tasks without accessing any training data from the past.
We propose a novel framework of fine-grained knowledge selection and restoration.
arXiv Detail & Related papers (2023-12-20T02:34:11Z) - Clustering-based Domain-Incremental Learning [4.835091081509403]
Key challenge in continual learning is the so-called "catastrophic forgetting problem"
We propose an online clustering-based approach on a dynamically updated finite pool of samples or gradients.
We demonstrate the effectiveness of the proposed strategy and its promising performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-09-21T13:49:05Z) - First-order ANIL provably learns representations despite overparametrization [21.74339210788053]
This work shows that first-order ANIL with a linear two-layer network architecture successfully learns linear shared representations.
Having a width larger than the dimension of the shared representations results in anally low-rank solution.
Overall, this illustrates how well model-agnostic methods such as first-order ANIL can learn shared representations.
arXiv Detail & Related papers (2023-03-02T15:13:37Z) - Planning for Sample Efficient Imitation Learning [52.44953015011569]
Current imitation algorithms struggle to achieve high performance and high in-environment sample efficiency simultaneously.
We propose EfficientImitate, a planning-based imitation learning method that can achieve high in-environment sample efficiency and performance simultaneously.
Experimental results show that EI achieves state-of-the-art results in performance and sample efficiency.
arXiv Detail & Related papers (2022-10-18T05:19:26Z) - Relational Experience Replay: Continual Learning by Adaptively Tuning
Task-wise Relationship [54.73817402934303]
We propose Experience Continual Replay (ERR), a bi-level learning framework to adaptively tune task-wise to achieve a better stability plasticity' tradeoff.
ERR can consistently improve the performance of all baselines and surpass current state-of-the-art methods.
arXiv Detail & Related papers (2021-12-31T12:05:22Z) - Learning from Guided Play: A Scheduled Hierarchical Approach for
Improving Exploration in Adversarial Imitation Learning [7.51557557629519]
We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of, in addition to a main task, multiple auxiliary tasks.
This affords many benefits: learning efficiency is improved for main tasks with challenging bottleneck transitions, expert data becomes reusable between tasks, and transfer learning through the reuse of learned auxiliary task models becomes possible.
arXiv Detail & Related papers (2021-12-16T14:58:08Z) - Conflict-Averse Gradient Descent for Multi-task Learning [56.379937772617]
A major challenge in optimizing a multi-task model is the conflicting gradients.
We introduce Conflict-Averse Gradient descent (CAGrad) which minimizes the average loss function.
CAGrad balances the objectives automatically and still provably converges to a minimum over the average loss.
arXiv Detail & Related papers (2021-10-26T22:03:51Z) - Gradient Surgery for Multi-Task Learning [119.675492088251]
Multi-task learning has emerged as a promising approach for sharing structure across multiple tasks.
The reasons why multi-task learning is so challenging compared to single-task learning are not fully understood.
We propose a form of gradient surgery that projects a task's gradient onto the normal plane of the gradient of any other task that has a conflicting gradient.
arXiv Detail & Related papers (2020-01-19T06:33:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.