Related papers: Continual Training of Language Models for Few-Shot Learning

Continual Training of Language Models for Few-Shot Learning

URL: http://arxiv.org/abs/2210.05549v1
Date: Tue, 11 Oct 2022 15:43:58 GMT
Title: Continual Training of Language Models for Few-Shot Learning
Authors: Zixuan Ke, Haowei Lin, Yijia Shao, Hu Xu, Lei Shu, and Bing Liu
Abstract summary: Recent work on applying large language models (LMs) achieves impressive performance in many NLP applications. Adapting or posttraining an LM using an unlabeled domain corpus can produce even better performance for end-tasks in the domain. This paper proposes the problem of continually extending an LM by incrementally post-train the LM with a sequence of unlabeled domain corpora. The resulting system is called CPT (Continual PostTraining), which to our knowledge, is the first continual post-training system.
Score: 20.840674614655942
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Recent work on applying large language models (LMs) achieves impressive performance in many NLP applications. Adapting or posttraining an LM using an unlabeled domain corpus can produce even better performance for end-tasks in the domain. This paper proposes the problem of continually extending an LM by incrementally post-train the LM with a sequence of unlabeled domain corpora to expand its knowledge without forgetting its previous skills. The goal is to improve the few-shot end-task learning in these domains. The resulting system is called CPT (Continual PostTraining), which to our knowledge, is the first continual post-training system. Experimental results verify its effectiveness.

Related papers

InfoSteer: Steering Information Utility in Language Model Post-Training [7.756342860929851]
We present a lightweight method that encourages parametric information utilization in language models (LMs) during post-training.<n>We find this simple guidance delivers consistent performance improvements across diverse model families--including Qwen, Gemma and Llama.<n>Our work underscores that vanilla post-training does not fully leverage pre-training potential, and steering LMs in latent representation space offers a promising approach.
arXiv Detail & Related papers (2025-07-07T16:13:21Z)
Leveraging Domain Knowledge at Inference Time for LLM Translation: Retrieval versus Generation [36.41708236431343]
Large language models (LLMs) have been increasingly adopted for machine translation (MT) Our work studies domain-adapted MT with LLMs through a careful prompting setup. We find that demonstrations consistently outperform terminology, and retrieval consistently outperforms generation.
arXiv Detail & Related papers (2025-03-06T22:23:07Z)
LLM Post-Training: A Deep Dive into Reasoning Large Language Models [131.10969986056]
Large Language Models (LLMs) have transformed the natural language processing landscape and brought to life diverse applications. Post-training methods enable LLMs to refine their knowledge, improve reasoning, enhance factual accuracy, and align more effectively with user intents and ethical considerations.
arXiv Detail & Related papers (2025-02-28T18:59:54Z)
TasTe: Teaching Large Language Models to Translate through Self-Reflection [82.83958470745381]
Large language models (LLMs) have exhibited remarkable performance in various natural language processing tasks. We propose the TasTe framework, which stands for translating through self-reflection. The evaluation results in four language directions on the WMT22 benchmark reveal the effectiveness of our approach compared to existing methods.
arXiv Detail & Related papers (2024-06-12T17:21:21Z)
Fine-tuning Large Language Models for Domain-specific Machine Translation [8.439661191792897]
Large language models (LLMs) have made significant progress in machine translation (MT) However, their potential in domain-specific MT remains under-explored. This paper proposes a prompt-oriented fine-tuning method, denoted as LlamaIT, to effectively and efficiently fine-tune a general-purpose LLM for domain-specific MT tasks.
arXiv Detail & Related papers (2024-02-23T02:24:15Z)
Learn or Recall? Revisiting Incremental Learning with Pre-trained Language Models [21.95081572612883]
Most assume that catastrophic forgetting is the biggest obstacle to achieving superior IL performance. We propose a frustratingly easy method called SEQ* for IL with PLMs. Results show that SEQ* has competitive or superior performance compared to state-of-the-art (SOTA) IL methods.
arXiv Detail & Related papers (2023-12-13T04:14:22Z)
Improving Language Plasticity via Pretraining with Active Forgetting [63.36484652568976]
We propose to use an active forgetting mechanism during pretraining, as a simple way of creating PLMs that can quickly adapt to new languages. Experiments with RoBERTa show that models pretrained with our forgetting mechanism demonstrate faster convergence during language adaptation.
arXiv Detail & Related papers (2023-07-03T17:12:44Z)
Continual Pre-training of Language Models [11.59945701446951]
Existing research has shown that further pre-training an LM using a domain corpus to adapt the LM to the domain can improve the end-task performance in the domain. This paper proposes a novel method to continually DAP-train an LM with a sequence of unlabeled domain corpora to adapt the LM to these domains to improve their end-task performances.
arXiv Detail & Related papers (2023-02-07T03:57:55Z)
Prompt Tuning for Discriminative Pre-trained Language Models [96.04765512463415]
Recent works have shown promising results of prompt tuning in stimulating pre-trained language models (PLMs) for natural language processing (NLP) tasks. It is still unknown whether and how discriminative PLMs, e.g., ELECTRA, can be effectively prompt-tuned. We present DPT, the first prompt tuning framework for discriminative PLMs, which reformulates NLP tasks into a discriminative language modeling problem.
arXiv Detail & Related papers (2022-05-23T10:11:50Z)
KALA: Knowledge-Augmented Language Model Adaptation [65.92457495576141]
We propose a novel domain adaption framework for pre-trained language models (PLMs) Knowledge-Augmented Language model Adaptation (KALA) modulates the intermediate hidden representations of PLMs with domain knowledge. Results show that, despite being computationally efficient, our KALA largely outperforms adaptive pre-training.
arXiv Detail & Related papers (2022-04-22T08:11:59Z)
ELLE: Efficient Lifelong Pre-training for Emerging Data [91.52652408402815]
Current pre-trained language models (PLM) are typically trained with static data, ignoring that in real-world scenarios, streaming data of various sources may continuously grow. We propose ELLE, aiming at efficient lifelong pre-training for emerging data. ELLE consists of (1) function preserved model expansion, which flexibly expands an existing PLM's width and depth to improve the efficiency of knowledge acquisition; and (2) pre-trained domain prompts, which disentangle the versatile knowledge learned during pre-training and stimulate the proper knowledge for downstream tasks.
arXiv Detail & Related papers (2022-03-12T01:53:53Z)
Multi-Stage Pre-training for Low-Resource Domain Adaptation [24.689862495171408]
Current approaches directly adapt a pre-trained language model (LM) on in-domain text before fine-tuning to downstream tasks. We show that extending the vocabulary of the LM with domain-specific terms leads to further gains. We apply these approaches incrementally on a pre-trained Roberta-large LM and show considerable performance gain on three tasks in the IT domain.
arXiv Detail & Related papers (2020-10-12T17:57:00Z)
Feature Adaptation of Pre-Trained Language Models across Languages and Domains with Robust Self-Training [47.12438995938133]
We adapt pre-trained language models (PrLMs) to new domains without fine-tuning. We present class-aware feature self-distillation (CFd) to learn discriminative features from PrLMs. Experiments on two monolingual and multilingual Amazon review datasets show that CFd can consistently improve the performance of self-training.
arXiv Detail & Related papers (2020-09-24T08:04:37Z)
Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks. We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.