Scalable Language Model with Generalized Continual Learning
- URL: http://arxiv.org/abs/2404.07470v1
- Date: Thu, 11 Apr 2024 04:22:15 GMT
- Title: Scalable Language Model with Generalized Continual Learning
- Authors: Bohao Peng, Zhuotao Tian, Shu Liu, Mingchang Yang, Jiaya Jia,
- Abstract summary: The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
- Score: 58.700439919096155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual learning has gained increasing importance as it facilitates the acquisition and refinement of scalable knowledge and skills in language models. However, existing methods typically encounter strict limitations and challenges in real-world scenarios, such as reliance on experience replay, optimization constraints, and inference task-ID. In this study, we introduce the Scalable Language Model (SLM) to overcome these limitations within a more challenging and generalized setting, representing a significant advancement toward practical applications for continual learning. Specifically, we propose the Joint Adaptive Re-Parameterization (JARe), integrated with Dynamic Task-related Knowledge Retrieval (DTKR), to enable adaptive adjustment of language models based on specific downstream tasks. This approach leverages the task distribution within the vector space, aiming to achieve a smooth and effortless continual learning process. Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting. Moreover, while prior research primarily focused on a single task type such as classification, our study goes beyond, with the large language model, i.e., LLaMA-2, to explore the effects across diverse domains and task types, such that a single language model can be decently scaled to broader applications.
Related papers
- Modality-Inconsistent Continual Learning of Multimodal Large Language Models [37.15220266767881]
We introduce Modality-Inconsistent Continual Learning (MICL), a new continual learning scenario for Multimodal Large Language Models (MLLMs)
Unlike existing vision-only or modality-incremental settings, MICL combines modality and task type shifts, both of which drive catastrophic forgetting.
We propose MoInCL, which employs a Pseudo Targets Generation Module to mitigate forgetting caused by task type shifts in previously seen modalities.
arXiv Detail & Related papers (2024-12-17T16:13:56Z) - Align, Generate, Learn: A Novel Closed-Loop Framework for Cross-Lingual In-Context Learning [0.0]
Cross-lingual in-context learning (XICL) has emerged as a transformative paradigm for leveraging large language models (LLMs) to tackle multilingual tasks.
We propose a novel self-supervised framework that harnesses the generative capabilities of LLMs to internally select and utilize task-relevant examples.
arXiv Detail & Related papers (2024-12-12T05:36:51Z) - LLMs for Generalizable Language-Conditioned Policy Learning under Minimal Data Requirements [50.544186914115045]
This paper presents TEDUO, a novel training pipeline for offline language-conditioned policy learning.
TEDUO operates on easy-to-obtain, unlabeled datasets and is suited for the so-called in-the-wild evaluation, wherein the agent encounters previously unseen goals and states.
arXiv Detail & Related papers (2024-12-09T18:43:56Z) - Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks.
Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval.
This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z) - RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks.
Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs.
In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z) - Towards Unified Task Embeddings Across Multiple Models: Bridging the Gap for Prompt-Based Large Language Models and Beyond [16.913115978881866]
We propose a framework for unified task embeddings (FUTE), task embeddings from various models, including smaller language models and Large Language Models with varied prompts, within a single vector space.
Such uniformity enables comparison and analysis of similarities amongst different models, broadening the scope and utility of existing task embedding methods in multi-model scenarios.
arXiv Detail & Related papers (2024-02-22T13:13:31Z) - Low-Rank Adaptation for Multilingual Summarization: An Empirical Study [60.541168233698194]
We investigate the potential of.
Efficient Fine-Tuning, focusing on Low-Rank Adaptation (LoRA) in the domain of multilingual summarization.
We conduct an extensive study across different data availability scenarios, including high- and low-data settings, and cross-lingual transfer.
Our findings reveal that LoRA is competitive with full fine-tuning when trained with high quantities of data, and excels in low-data scenarios and cross-lingual transfer.
arXiv Detail & Related papers (2023-11-14T22:32:39Z) - Context-Aware Language Modeling for Goal-Oriented Dialogue Systems [84.65707332816353]
We formulate goal-oriented dialogue as a partially observed Markov decision process.
We derive a simple and effective method to finetune language models in a goal-aware way.
We evaluate our method on a practical flight-booking task using AirDialogue.
arXiv Detail & Related papers (2022-04-18T17:23:11Z) - On the Universality of Deep COntextual Language Models [15.218264849664715]
Deep Contextual Language Models (LMs) like ELMO, BERT, and their successors dominate the landscape of Natural Language Processing.
Multilingual versions of such models like XLM-R and mBERT have given promising results in zero-shot cross-lingual transfer.
Due to this initial success, pre-trained models are being used as Universal Language Models'
arXiv Detail & Related papers (2021-09-15T08:00:33Z) - CALM: Continuous Adaptive Learning for Language Modeling [18.72860206714457]
Training large language representation models has become a standard in the natural language processing community.
We demonstrate that in practice these pre-trained models present performance deterioration in the form of catastrophic forgetting.
We propose CALM, Continuous Adaptive Learning for Language Modeling: techniques to render models which retain knowledge across multiple domains.
arXiv Detail & Related papers (2020-04-08T03:51:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.