Related papers: Orchestrate Latent Expertise: Advancing Online Continual Learning with Multi-Level Supervision and Reverse Self-Distillation

Orchestrate Latent Expertise: Advancing Online Continual Learning with Multi-Level Supervision and Reverse Self-Distillation

URL: http://arxiv.org/abs/2404.00417v1
Date: Sat, 30 Mar 2024 16:53:10 GMT
Title: Orchestrate Latent Expertise: Advancing Online Continual Learning with Multi-Level Supervision and Reverse Self-Distillation
Authors: HongWei Yan, Liyuan Wang, Kaisheng Ma, Yi Zhong,
Abstract summary: Online Continual Learning (OCL) is a more challenging yet realistic setting that performs CL in a one-pass data stream. We introduce a novel approach, Multi-level Online Sequential Experts (MOSE) MOSE cultivates the model as stacked sub-experts, integrating multi-level supervision and reverse self-distillation.
Score: 38.39340194054917
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To accommodate real-world dynamics, artificial intelligence systems need to cope with sequentially arriving content in an online manner. Beyond regular Continual Learning (CL) attempting to address catastrophic forgetting with offline training of each task, Online Continual Learning (OCL) is a more challenging yet realistic setting that performs CL in a one-pass data stream. Current OCL methods primarily rely on memory replay of old training samples. However, a notable gap from CL to OCL stems from the additional overfitting-underfitting dilemma associated with the use of rehearsal buffers: the inadequate learning of new training samples (underfitting) and the repeated learning of a few old training samples (overfitting). To this end, we introduce a novel approach, Multi-level Online Sequential Experts (MOSE), which cultivates the model as stacked sub-experts, integrating multi-level supervision and reverse self-distillation. Supervision signals across multiple stages facilitate appropriate convergence of the new task while gathering various strengths from experts by knowledge distillation mitigates the performance decline of old tasks. MOSE demonstrates remarkable efficacy in learning new samples and preserving past knowledge through multi-level experts, thereby significantly advancing OCL performance over state-of-the-art baselines (e.g., up to 7.3% on Split CIFAR-100 and 6.1% on Split Tiny-ImageNet).

Related papers

Continual Task Learning through Adaptive Policy Self-Composition [54.95680427960524]
CompoFormer is a structure-based continual transformer model that adaptively composes previous policies via a meta-policy network. Our experiments reveal that CompoFormer outperforms conventional continual learning (CL) methods, particularly in longer task sequences.
arXiv Detail & Related papers (2024-11-18T08:20:21Z)
Temporal-Difference Variational Continual Learning [89.32940051152782]
We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.<n>Our approach effectively mitigates Catastrophic Forgetting, outperforming strong Variational CL methods.
arXiv Detail & Related papers (2024-10-10T10:58:41Z)
ICL-TSVD: Bridging Theory and Practice in Continual Learning with Pre-trained Models [103.45785408116146]
Continual learning (CL) aims to train a model that can solve multiple tasks presented sequentially. Recent CL approaches have achieved strong performance by leveraging large pre-trained models that generalize well to downstream tasks. However, such methods lack theoretical guarantees, making them prone to unexpected failures. We bridge this gap by integrating an empirically strong approach into a principled framework, designed to prevent forgetting.
arXiv Detail & Related papers (2024-10-01T12:58:37Z)
Theory on Mixture-of-Experts in Continual Learning [72.42497633220547]
Continual learning (CL) has garnered significant attention because of its ability to adapt to new tasks that arrive over time. Catastrophic forgetting (of old tasks) has been identified as a major issue in CL, as the model adapts to new tasks. MoE model has recently been shown to effectively mitigate catastrophic forgetting in CL, by employing a gating network.
arXiv Detail & Related papers (2024-06-24T08:29:58Z)
DELTA: Decoupling Long-Tailed Online Continual Learning [7.507868991415516]
Long-Tailed Online Continual Learning (LTOCL) aims to learn new tasks from sequentially arriving class-imbalanced data streams. We present DELTA, a decoupled learning approach designed to enhance learning representations. We demonstrate that DELTA improves the capacity for incremental learning, surpassing existing OCL methods.
arXiv Detail & Related papers (2024-04-06T02:33:04Z)
ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning [54.68180752416519]
Panoptic segmentation is a cutting-edge computer vision task. We introduce a novel and efficient method for continual panoptic segmentation based on Visual Prompt Tuning, dubbed ECLIPSE. Our approach involves freezing the base model parameters and fine-tuning only a small set of prompt embeddings, addressing both catastrophic forgetting and plasticity.
arXiv Detail & Related papers (2024-03-29T11:31:12Z)
Dynamic Sub-graph Distillation for Robust Semi-supervised Continual Learning [52.046037471678005]
We focus on semi-supervised continual learning (SSCL), where the model progressively learns from partially labeled data with unknown categories. We propose a novel approach called Dynamic Sub-Graph Distillation (DSGD) for semi-supervised continual learning.
arXiv Detail & Related papers (2023-12-27T04:40:12Z)
Metalearning Continual Learning Algorithms [42.710124929514066]
We propose Automated Continual Learning (ACL) to train self-referential neural networks to continual (meta)learning algorithms. ACL encodes continual learning (CL) desiderata -- good performance on both old and new tasks -- into its metalearning objectives. Our experiments demonstrate that ACL effectively resolves "in-context catastrophic forgetting," a problem that naive in-context learning algorithms suffer from.
arXiv Detail & Related papers (2023-12-01T01:25:04Z)
Plasticity-Optimized Complementary Networks for Unsupervised Continual Learning [22.067640536948545]
Continuous unsupervised representation learning (CURL) research has greatly benefited from improvements in self-supervised learning (SSL) techniques. Existing CURL methods using SSL can learn high-quality representations without any labels, but with a notable performance drop when learning on a many-tasks data stream. We propose to train an expert network that is relieved of the duty of keeping the previous knowledge and can focus on performing optimally on the new tasks.
arXiv Detail & Related papers (2023-09-12T09:31:34Z)
On the Effectiveness of Equivariant Regularization for Robust Online Continual Learning [17.995662644298974]
Continual Learning (CL) approaches seek to bridge this gap by facilitating the transfer of knowledge to both previous tasks and future ones. Recent research has shown that self-supervision can produce versatile models that can generalize well to diverse downstream tasks. We propose Continual Learning via Equivariant Regularization (CLER), an OCL approach that leverages equivariant tasks for self-supervision.
arXiv Detail & Related papers (2023-05-05T16:10:31Z)
Kaizen: Practical Self-supervised Continual Learning with Continual Fine-tuning [21.36130180647864]
Retraining a model from scratch to adapt to newly generated data is time-consuming and inefficient. We introduce a training architecture that is able to mitigate catastrophic forgetting. Kaizen significantly outperforms previous SSL models in competitive vision benchmarks.
arXiv Detail & Related papers (2023-03-30T09:08:57Z)
Mitigating Forgetting in Online Continual Learning via Contrasting Semantically Distinct Augmentations [22.289830907729705]
Online continual learning (OCL) aims to enable model learning from a non-stationary data stream to continuously acquire new knowledge as well as retain the learnt one. Main challenge comes from the "catastrophic forgetting" issue -- the inability to well remember the learnt knowledge while learning the new ones.
arXiv Detail & Related papers (2022-11-10T05:29:43Z)
Contrastive Learning with Adversarial Examples [79.39156814887133]
Contrastive learning (CL) is a popular technique for self-supervised learning (SSL) of visual representations. This paper introduces a new family of adversarial examples for constrastive learning and using these examples to define a new adversarial training algorithm for SSL, denoted as CLAE.
arXiv Detail & Related papers (2020-10-22T20:45:10Z)
Bilevel Continual Learning [76.50127663309604]
We present a novel framework of continual learning named "Bilevel Continual Learning" (BCL) Our experiments on continual learning benchmarks demonstrate the efficacy of the proposed BCL compared to many state-of-the-art methods.
arXiv Detail & Related papers (2020-07-30T16:00:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.