KALA: Knowledge-Augmented Language Model Adaptation
- URL: http://arxiv.org/abs/2204.10555v1
- Date: Fri, 22 Apr 2022 08:11:59 GMT
- Title: KALA: Knowledge-Augmented Language Model Adaptation
- Authors: Minki Kang, Jinheon Baek, Sung Ju Hwang
- Abstract summary: We propose a novel domain adaption framework for pre-trained language models (PLMs)
Knowledge-Augmented Language model Adaptation (KALA) modulates the intermediate hidden representations of PLMs with domain knowledge.
Results show that, despite being computationally efficient, our KALA largely outperforms adaptive pre-training.
- Score: 65.92457495576141
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pre-trained language models (PLMs) have achieved remarkable success on
various natural language understanding tasks. Simple fine-tuning of PLMs, on
the other hand, might be suboptimal for domain-specific tasks because they
cannot possibly cover knowledge from all domains. While adaptive pre-training
of PLMs can help them obtain domain-specific knowledge, it requires a large
training cost. Moreover, adaptive pre-training can harm the PLM's performance
on the downstream task by causing catastrophic forgetting of its general
knowledge. To overcome such limitations of adaptive pre-training for PLM
adaption, we propose a novel domain adaption framework for PLMs coined as
Knowledge-Augmented Language model Adaptation (KALA), which modulates the
intermediate hidden representations of PLMs with domain knowledge, consisting
of entities and their relational facts. We validate the performance of our KALA
on question answering and named entity recognition tasks on multiple datasets
across various domains. The results show that, despite being computationally
efficient, our KALA largely outperforms adaptive pre-training. Code is
available at: https://github.com/Nardien/KALA/.
Related papers
- Exploring Language Model Generalization in Low-Resource Extractive QA [57.14068405860034]
We investigate Extractive Question Answering (EQA) with Large Language Models (LLMs) under domain drift.
We devise a series of experiments to empirically explain the performance gap.
arXiv Detail & Related papers (2024-09-27T05:06:43Z) - Empowering Source-Free Domain Adaptation with MLLM-driven Curriculum Learning [5.599218556731767]
Source-Free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to a target domain using only unlabeled target data.
Reliability-based Curriculum Learning (RCL) integrates multiple MLLMs for knowledge exploitation via pseudo-labeling in SFDA.
arXiv Detail & Related papers (2024-05-28T17:18:17Z) - PANDA: Preference Adaptation for Enhancing Domain-Specific Abilities of LLMs [49.32067576992511]
Large language models often fall short of the performance achieved by domain-specific state-of-the-art models.
One potential approach to enhance domain-specific capabilities of LLMs involves fine-tuning them using corresponding datasets.
We propose Preference Adaptation for Enhancing Domain-specific Abilities of LLMs (PANDA)
Our experimental results reveal that PANDA significantly enhances the domain-specific ability of LLMs on text classification and interactive decision tasks.
arXiv Detail & Related papers (2024-02-20T09:02:55Z) - Knowledge Plugins: Enhancing Large Language Models for Domain-Specific
Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE.
This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z) - Decouple knowledge from parameters for plug-and-play language modeling [77.5601135412186]
We introduce PlugLM, a pre-training model with differentiable plug-in memory(DPM)
The key intuition is to decouple the knowledge storage from model parameters with an editable and scalable key-value memory.
PlugLM obtains 3.95 F1 improvements across four domains on average without any in-domain pre-training.
arXiv Detail & Related papers (2023-05-19T10:01:55Z) - Plan, Eliminate, and Track -- Language Models are Good Teachers for
Embodied Agents [99.17668730578586]
Pre-trained large language models (LLMs) capture procedural knowledge about the world.
Plan, Eliminate, and Track (PET) framework translates a task description into a list of high-level sub-tasks.
PET framework leads to a significant 15% improvement over SOTA for generalization to human goal specifications.
arXiv Detail & Related papers (2023-05-03T20:11:22Z) - VarMAE: Pre-training of Variational Masked Autoencoder for
Domain-adaptive Language Understanding [5.1282202633907]
We propose a novel Transformer-based language model named VarMAE for domain-adaptive language understanding.
Under the masked autoencoding objective, we design a context uncertainty learning module to encode the token's context into a smooth latent distribution.
Experiments on science- and finance-domain NLU tasks demonstrate that VarMAE can be efficiently adapted to new domains with limited resources.
arXiv Detail & Related papers (2022-11-01T12:51:51Z) - Domain-oriented Language Pre-training with Adaptive Hybrid Masking and
Optimal Transport Alignment [43.874781718934486]
We provide a general domain-oriented approach to adapt pre-trained language models for different application domains.
To preserve phrase knowledge effectively, we build a domain phrase pool as auxiliary training tool.
We introduce Cross Entity Alignment to leverage entity association as weak supervision to augment the semantic learning of pre-trained models.
arXiv Detail & Related papers (2021-12-01T15:47:01Z) - RuleBert: Teaching Soft Rules to Pre-trained Language Models [21.69870624809201]
We introduce a classification task where, given facts and soft rules, the PLM should return a prediction with a probability for a given hypothesis.
We propose a revised loss function that enables the PLM to learn how to predict precise probabilities for the task.
Our evaluation results show that the resulting fine-tuned models achieve very high performance, even on logical rules that were unseen at training.
arXiv Detail & Related papers (2021-09-24T16:19:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.