Adapting a Language Model While Preserving its General Knowledge
- URL: http://arxiv.org/abs/2301.08986v1
- Date: Sat, 21 Jan 2023 17:57:53 GMT
- Title: Adapting a Language Model While Preserving its General Knowledge
- Authors: Zixuan Ke, Yijia Shao, Haowei Lin, Hu Xu, Lei Shu and Bing Liu
- Abstract summary: Domain-adaptive pre-training (or DA-training for short) aims to train a pre-trained general-purpose language model (LM) using an unlabeled corpus of a particular domain to adapt the LM.
Existing DA-training methods are in some sense blind as they do not explicitly identify what knowledge in the LM should be preserved and what should be changed by the domain corpus.
This paper shows that the existing methods are suboptimal and proposes a novel method to perform a more informed adaptation of the knowledge in the LM.
- Score: 22.083108548675494
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Domain-adaptive pre-training (or DA-training for short), also known as
post-training, aims to train a pre-trained general-purpose language model (LM)
using an unlabeled corpus of a particular domain to adapt the LM so that
end-tasks in the domain can give improved performances. However, existing
DA-training methods are in some sense blind as they do not explicitly identify
what knowledge in the LM should be preserved and what should be changed by the
domain corpus. This paper shows that the existing methods are suboptimal and
proposes a novel method to perform a more informed adaptation of the knowledge
in the LM by (1) soft-masking the attention heads based on their importance to
best preserve the general knowledge in the LM and (2) contrasting the
representations of the general and the full (both general and domain knowledge)
to learn an integrated representation with both general and domain-specific
knowledge. Experimental results will demonstrate the effectiveness of the
proposed approach.
Related papers
- Mix-CPT: A Domain Adaptation Framework via Decoupling Knowledge Learning and Format Alignment [120.06538000214552]
Adapting general large language models (LLMs) to specialized domains presents great challenges due to varied data distributions.
We propose a new domain adaptation framework including domain knowledge learning and general format alignment, called Mix-CPT.
Our proposed Mix-CPT framework can simultaneously improve the task-solving capabilities of LLMs on the target and general domains.
arXiv Detail & Related papers (2024-07-15T15:20:13Z) - A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication.
This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches.
We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z) - Evolving Domain Adaptation of Pretrained Language Models for Text
Classification [24.795214770636534]
Adapting pre-trained language models (PLMs) for time-series text classification amidst evolving domain shifts (EDS) is critical for maintaining accuracy in applications like stance detection.
This study benchmarks the effectiveness of evolving domain adaptation (EDA) strategies, notably self-training, domain-adversarial training, and domain-adaptive pretraining, with a focus on an incremental self-training method.
arXiv Detail & Related papers (2023-11-16T08:28:00Z) - Knowledge Plugins: Enhancing Large Language Models for Domain-Specific
Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE.
This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z) - Continual Pre-training of Language Models [11.59945701446951]
Existing research has shown that further pre-training an LM using a domain corpus to adapt the LM to the domain can improve the end-task performance in the domain.
This paper proposes a novel method to continually DAP-train an LM with a sequence of unlabeled domain corpora to adapt the LM to these domains to improve their end-task performances.
arXiv Detail & Related papers (2023-02-07T03:57:55Z) - Prior Knowledge Guided Unsupervised Domain Adaptation [82.9977759320565]
We propose a Knowledge-guided Unsupervised Domain Adaptation (KUDA) setting where prior knowledge about the target class distribution is available.
In particular, we consider two specific types of prior knowledge about the class distribution in the target domain: Unary Bound and Binary Relationship.
We propose a rectification module that uses such prior knowledge to refine model generated pseudo labels.
arXiv Detail & Related papers (2022-07-18T18:41:36Z) - KALA: Knowledge-Augmented Language Model Adaptation [65.92457495576141]
We propose a novel domain adaption framework for pre-trained language models (PLMs)
Knowledge-Augmented Language model Adaptation (KALA) modulates the intermediate hidden representations of PLMs with domain knowledge.
Results show that, despite being computationally efficient, our KALA largely outperforms adaptive pre-training.
arXiv Detail & Related papers (2022-04-22T08:11:59Z) - Domain-oriented Language Pre-training with Adaptive Hybrid Masking and
Optimal Transport Alignment [43.874781718934486]
We provide a general domain-oriented approach to adapt pre-trained language models for different application domains.
To preserve phrase knowledge effectively, we build a domain phrase pool as auxiliary training tool.
We introduce Cross Entity Alignment to leverage entity association as weak supervision to augment the semantic learning of pre-trained models.
arXiv Detail & Related papers (2021-12-01T15:47:01Z) - Domain Adaption for Knowledge Tracing [65.86619804954283]
We propose a novel adaptable framework, namely knowledge tracing (AKT) to address the DAKT problem.
For the first aspect, we incorporate the educational characteristics (e.g., slip, guess, question texts) based on the deep knowledge tracing (DKT) to obtain a good performed knowledge tracing model.
For the second aspect, we propose and adopt three domain adaptation processes. First, we pre-train an auto-encoder to select useful source instances for target model training.
arXiv Detail & Related papers (2020-01-14T15:04:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.