Learn or Recall? Revisiting Incremental Learning with Pre-trained Language Models
- URL: http://arxiv.org/abs/2312.07887v5
- Date: Thu, 8 Aug 2024 03:49:59 GMT
- Title: Learn or Recall? Revisiting Incremental Learning with Pre-trained Language Models
- Authors: Junhao Zheng, Shengjie Qiu, Qianli Ma,
- Abstract summary: Most assume that catastrophic forgetting is the biggest obstacle to achieving superior IL performance.
We propose a frustratingly easy method called SEQ* for IL with PLMs.
Results show that SEQ* has competitive or superior performance compared to state-of-the-art (SOTA) IL methods.
- Score: 21.95081572612883
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Incremental Learning (IL) has been a long-standing problem in both vision and Natural Language Processing (NLP) communities. In recent years, as Pre-trained Language Models (PLMs) have achieved remarkable progress in various NLP downstream tasks, utilizing PLMs as backbones has become a common practice in recent research of IL in NLP. Most assume that catastrophic forgetting is the biggest obstacle to achieving superior IL performance and propose various techniques to overcome this issue. However, we find that this assumption is problematic. Specifically, we revisit more than 20 methods on four classification tasks (Text Classification, Intent Classification, Relation Extraction, and Named Entity Recognition) under the two most popular IL settings (Class-Incremental and Task-Incremental) and reveal that most of them severely underestimate the inherent anti-forgetting ability of PLMs. Based on the observation, we propose a frustratingly easy method called SEQ* for IL with PLMs. The results show that SEQ* has competitive or superior performance compared to state-of-the-art (SOTA) IL methods and requires considerably less trainable parameters and training time. These findings urge us to revisit the IL with PLMs and encourage future studies to have a fundamental understanding of the catastrophic forgetting in PLMs. The data, code and scripts are publicly available at https://github.com/zzz47zzz/codebase-for-incremental-learning-with-llm.
Related papers
- Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation [55.21013307734612]
AoPS-Instruct is a dataset of more than 600,000 high-quality QA pairs.
LiveAoPSBench is an evolving evaluation set with timestamps, derived from the latest forum data.
Our work presents a scalable approach to creating and maintaining large-scale, high-quality datasets for advanced math reasoning.
arXiv Detail & Related papers (2025-01-24T06:39:38Z) - StateAct: State Tracking and Reasoning for Acting and Planning with Large Language Models [10.359008237358603]
Planning and acting to solve real' tasks using large language models (LLMs) in interactive environments has become a new frontier for AI methods.
We propose a simple method based on few-shot in-context learning alone to enhance chain-of-thought' with state-tracking.
We show that our method establishes the new state-of-the-art on Alfworld for in-context learning methods.
arXiv Detail & Related papers (2024-09-21T05:54:35Z) - FIRST: Faster Improved Listwise Reranking with Single Token Decoding [56.727761901751194]
First, we introduce FIRST, a novel listwise LLM reranking approach leveraging the output logits of the first generated identifier to directly obtain a ranked ordering of the candidates.
Empirical results demonstrate that FIRST accelerates inference by 50% while maintaining a robust ranking performance with gains across the BEIR benchmark.
Our results show that LLM rerankers can provide a stronger distillation signal compared to cross-encoders, yielding substantial improvements in retriever recall after relevance feedback.
arXiv Detail & Related papers (2024-06-21T21:27:50Z) - Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [83.90988015005934]
Uncertainty quantification is a key element of machine learning applications.
We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines.
We conduct a large-scale empirical investigation of UQ and normalization techniques across eleven tasks, identifying the most effective approaches.
arXiv Detail & Related papers (2024-06-21T20:06:31Z) - Online Cascade Learning for Efficient Inference over Streams [9.516197133796437]
Large Language Models (LLMs) have a natural role in answering complex queries about data streams.
We propose online cascade learning, the first approach to address this challenge.
We formulate the task of learning cascades online as an imitation-learning problem.
arXiv Detail & Related papers (2024-02-07T01:46:50Z) - Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools.
Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions.
Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z) - Alleviating Over-smoothing for Unsupervised Sentence Representation [96.19497378628594]
We present a Simple method named Self-Contrastive Learning (SSCL) to alleviate this issue.
Our proposed method is quite simple and can be easily extended to various state-of-the-art models for performance boosting.
arXiv Detail & Related papers (2023-05-09T11:00:02Z) - On the Usage of Continual Learning for Out-of-Distribution
Generalization in Pre-trained Language Models of Code [12.708117108874083]
Pre-trained language models (PLMs) have become a prevalent technique in deep learning for code.
We study two widely used PLM architectures on two downstream tasks, API call and API usage prediction.
To address these issues, we implement five continual learning approaches, including replay-based and regularization-based methods.
arXiv Detail & Related papers (2023-05-06T18:00:21Z) - Prompt Tuning for Discriminative Pre-trained Language Models [96.04765512463415]
Recent works have shown promising results of prompt tuning in stimulating pre-trained language models (PLMs) for natural language processing (NLP) tasks.
It is still unknown whether and how discriminative PLMs, e.g., ELECTRA, can be effectively prompt-tuned.
We present DPT, the first prompt tuning framework for discriminative PLMs, which reformulates NLP tasks into a discriminative language modeling problem.
arXiv Detail & Related papers (2022-05-23T10:11:50Z) - Clinical Prompt Learning with Frozen Language Models [4.077071350659386]
Large but frozen pre-trained language models (PLMs) with prompt learning outperform smaller but fine-tuned models.
We investigated the viability of prompt learning on clinically meaningful decision tasks.
Results are partially in line with the prompt learning literature, with prompt learning able to match or improve on traditional fine-tuning.
arXiv Detail & Related papers (2022-05-11T14:25:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.