Recent Advances of Foundation Language Models-based Continual Learning: A Survey
- URL: http://arxiv.org/abs/2405.18653v2
- Date: Fri, 29 Nov 2024 16:19:01 GMT
- Title: Recent Advances of Foundation Language Models-based Continual Learning: A Survey
- Authors: Yutao Yang, Jie Zhou, Xuanwen Ding, Tianyu Huai, Shunyu Liu, Qin Chen, Yuan Xie, Liang He,
- Abstract summary: Foundation language models (LMs) have marked significant achievements in the domains of natural language processing (NLP) and computer vision (CV)
However, they can not emulate human-like continuous learning due to catastrophic forgetting.
Various continual learning (CL)-based methodologies have been developed to refine LMs, enabling them to adapt to new tasks without forgetting previous knowledge.
- Score: 31.171203978742447
- License:
- Abstract: Recently, foundation language models (LMs) have marked significant achievements in the domains of natural language processing (NLP) and computer vision (CV). Unlike traditional neural network models, foundation LMs obtain a great ability for transfer learning by acquiring rich commonsense knowledge through pre-training on extensive unsupervised datasets with a vast number of parameters. However, they still can not emulate human-like continuous learning due to catastrophic forgetting. Consequently, various continual learning (CL)-based methodologies have been developed to refine LMs, enabling them to adapt to new tasks without forgetting previous knowledge. However, a systematic taxonomy of existing approaches and a comparison of their performance are still lacking, which is the gap that our survey aims to fill. We delve into a comprehensive review, summarization, and classification of the existing literature on CL-based approaches applied to foundation language models, such as pre-trained language models (PLMs), large language models (LLMs) and vision-language models (VLMs). We divide these studies into offline CL and online CL, which consist of traditional methods, parameter-efficient-based methods, instruction tuning-based methods and continual pre-training methods. Offline CL encompasses domain-incremental learning, task-incremental learning, and class-incremental learning, while online CL is subdivided into hard task boundary and blurry task boundary settings. Additionally, we outline the typical datasets and metrics employed in CL research and provide a detailed analysis of the challenges and future work for LMs-based continual learning.
Related papers
- Continual Learning Should Move Beyond Incremental Classification [51.23416308775444]
Continual learning (CL) is the sub-field of machine learning concerned with accumulating knowledge in dynamic environments.
Here, we argue that maintaining such a focus limits both theoretical development and practical applicability of CL methods.
We identify three fundamental challenges: (C1) the nature of continuity in learning problems, (C2) the choice of appropriate spaces and metrics for measuring similarity, and (C3) the role of learning objectives beyond classification.
arXiv Detail & Related papers (2025-02-17T15:40:13Z) - LLMCL-GEC: Advancing Grammatical Error Correction with LLM-Driven Curriculum Learning [44.010834543396165]
Large-scale language models (LLMs) have demonstrated remarkable capabilities in specific natural language processing (NLP) tasks.
However, they may still lack proficiency compared to specialized models in certain domains, such as grammatical error correction (GEC)
arXiv Detail & Related papers (2024-12-17T05:09:07Z) - LLMs for Generalizable Language-Conditioned Policy Learning under Minimal Data Requirements [50.544186914115045]
This paper presents TEDUO, a novel training pipeline for offline language-conditioned policy learning.
TEDUO operates on easy-to-obtain, unlabeled datasets and is suited for the so-called in-the-wild evaluation, wherein the agent encounters previously unseen goals and states.
arXiv Detail & Related papers (2024-12-09T18:43:56Z) - Zero-shot Model-based Reinforcement Learning using Large Language Models [12.930241182192988]
We investigate how pre-trained Large Language Models can be leveraged to predict in context the dynamics of continuous Markov decision processes.
We present proof-of-concept applications in two reinforcement learning settings: model-based policy evaluation and data-augmented off-policy reinforcement learning.
arXiv Detail & Related papers (2024-10-15T15:46:53Z) - Probing the Decision Boundaries of In-context Learning in Large Language Models [31.977886254197138]
We propose a new mechanism to probe and understand in-context learning from the lens of decision boundaries for in-context binary classification.
To our surprise, we find that the decision boundaries learned by current LLMs in simple binary classification tasks are often irregular and non-smooth.
arXiv Detail & Related papers (2024-06-17T06:00:24Z) - Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - Continual Learning for Large Language Models: A Survey [95.79977915131145]
Large language models (LLMs) are not amenable to frequent re-training, due to high training costs arising from their massive scale.
This paper surveys recent works on continual learning for LLMs.
arXiv Detail & Related papers (2024-02-02T12:34:09Z) - Continual Learning with Pre-Trained Models: A Survey [61.97613090666247]
Continual Learning aims to overcome the catastrophic forgetting of former knowledge when learning new ones.
This paper presents a comprehensive survey of the latest advancements in PTM-based CL.
arXiv Detail & Related papers (2024-01-29T18:27:52Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - Continual Lifelong Learning in Natural Language Processing: A Survey [3.9103337761169943]
Continual learning (CL) aims to enable information systems to learn from a continuous data stream across time.
It is difficult for existing deep learning architectures to learn a new task without largely forgetting previously acquired knowledge.
We look at the problem of CL through the lens of various NLP tasks.
arXiv Detail & Related papers (2020-12-17T18:44:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.