In-context Continual Learning Assisted by an External Continual Learner
- URL: http://arxiv.org/abs/2412.15563v1
- Date: Fri, 20 Dec 2024 04:44:41 GMT
- Title: In-context Continual Learning Assisted by an External Continual Learner
- Authors: Saleh Momeni, Sahisnu Mazumder, Zixuan Ke, Bing Liu,
- Abstract summary: Existing continual learning (CL) methods rely on fine-tuning or adapting large language models (LLMs)
We introduce InCA, a novel approach that integrates an external continual learner (ECL) with ICL to enable scalable CL without CF.
- Score: 19.382196203113836
- License:
- Abstract: Existing continual learning (CL) methods mainly rely on fine-tuning or adapting large language models (LLMs). They still suffer from catastrophic forgetting (CF). Little work has been done to exploit in-context learning (ICL) to leverage the extensive knowledge within LLMs for CL without updating any parameters. However, incrementally learning each new task in ICL necessitates adding training examples from each class of the task to the prompt, which hampers scalability as the prompt length increases. This issue not only leads to excessively long prompts that exceed the input token limit of the underlying LLM but also degrades the model's performance due to the overextended context. To address this, we introduce InCA, a novel approach that integrates an external continual learner (ECL) with ICL to enable scalable CL without CF. The ECL is built incrementally to pre-select a small subset of likely classes for each test instance. By restricting the ICL prompt to only these selected classes, InCA prevents prompt lengths from becoming excessively long, while maintaining high performance. Experimental results demonstrate that InCA significantly outperforms existing CL baselines, achieving substantial performance gains.
Related papers
- Continual Learning Using Only Large Language Model Prompting [13.987306383667518]
We introduce CLOB, a novel continual learning paradigm wherein a large language model (LLM) is regarded as a black box.
We also propose a new CL technique, called CIS, based on incremental summarization that also overcomes the LLM's input length limit.
arXiv Detail & Related papers (2024-12-20T01:21:57Z) - ICLEval: Evaluating In-Context Learning Ability of Large Language Models [68.7494310749199]
In-Context Learning (ICL) is a critical capability of Large Language Models (LLMs) as it empowers them to comprehend and reason across interconnected inputs.
Existing evaluation frameworks primarily focus on language abilities and knowledge, often overlooking the assessment of ICL ability.
We introduce the ICLEval benchmark to evaluate the ICL abilities of LLMs, which encompasses two key sub-abilities: exact copying and rule learning.
arXiv Detail & Related papers (2024-06-21T08:06:10Z) - Many-Shot In-Context Learning [58.395589302800566]
Large language models (LLMs) excel at few-shot in-context learning (ICL)
We observe significant performance gains across a wide variety of generative and discriminative tasks.
Unlike few-shot learning, many-shot learning is effective at overriding pretraining biases.
arXiv Detail & Related papers (2024-04-17T02:49:26Z) - CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models [23.398619576886375]
Continual learning (CL) aims to help deep neural networks learn new knowledge while retaining what has been learned.
Our work proposes Continual LeArning with Probabilistic finetuning (CLAP) - a probabilistic modeling framework over visual-guided text features per task.
arXiv Detail & Related papers (2024-03-28T04:15:58Z) - Investigating the Learning Behaviour of In-context Learning: A
Comparison with Supervised Learning [67.25698169440818]
Large language models (LLMs) have shown remarkable capacity for in-context learning (ICL)
We train the same LLMs with the same demonstration examples via ICL and supervised learning (SL), respectively, and investigate their performance under label perturbations.
First, we find that gold labels have significant impacts on the downstream in-context performance, especially for large language models.
Second, when comparing with SL, we show empirically that ICL is less sensitive to label perturbations than SL, and ICL gradually attains comparable performance to SL as the model size increases.
arXiv Detail & Related papers (2023-07-28T09:03:19Z) - On the Effectiveness of Equivariant Regularization for Robust Online
Continual Learning [17.995662644298974]
Continual Learning (CL) approaches seek to bridge this gap by facilitating the transfer of knowledge to both previous tasks and future ones.
Recent research has shown that self-supervision can produce versatile models that can generalize well to diverse downstream tasks.
We propose Continual Learning via Equivariant Regularization (CLER), an OCL approach that leverages equivariant tasks for self-supervision.
arXiv Detail & Related papers (2023-05-05T16:10:31Z) - OpenICL: An Open-Source Framework for In-context Learning [48.75452105457122]
We introduce OpenICL, an open-source toolkit for In-context Learning (ICL) and large language model evaluation.
OpenICL is research-friendly with a highly flexible architecture that users can easily combine different components to suit their needs.
The effectiveness of OpenICL has been validated on a wide range of NLP tasks, including classification, QA, machine translation, and semantic parsing.
arXiv Detail & Related papers (2023-03-06T06:20:25Z) - Beyond Supervised Continual Learning: a Review [69.9674326582747]
Continual Learning (CL) is a flavor of machine learning where the usual assumption of stationary data distribution is relaxed or omitted.
Changes in the data distribution can cause the so-called catastrophic forgetting (CF) effect: an abrupt loss of previous knowledge.
This article reviews literature that study CL in other settings, such as learning with reduced supervision, fully unsupervised learning, and reinforcement learning.
arXiv Detail & Related papers (2022-08-30T14:44:41Z) - Learning with Multiple Complementary Labels [94.8064553345801]
A complementary label (CL) simply indicates an incorrect class of an example, but learning with CLs results in multi-class classifiers.
We propose a novel problem setting to allow MCLs for each example and two ways for learning with MCLs.
arXiv Detail & Related papers (2019-12-30T13:50:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.