Decouple knowledge from parameters for plug-and-play language modeling
- URL: http://arxiv.org/abs/2305.11564v2
- Date: Mon, 18 Sep 2023 09:42:26 GMT
- Title: Decouple knowledge from parameters for plug-and-play language modeling
- Authors: Xin Cheng, Yankai Lin, Xiuying Chen, Dongyan Zhao, Rui Yan
- Abstract summary: We introduce PlugLM, a pre-training model with differentiable plug-in memory(DPM)
The key intuition is to decouple the knowledge storage from model parameters with an editable and scalable key-value memory.
PlugLM obtains 3.95 F1 improvements across four domains on average without any in-domain pre-training.
- Score: 77.5601135412186
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pre-trained language models(PLM) have made impressive results in various NLP
tasks. It has been revealed that one of the key factors to their success is the
parameters of these models implicitly learn all kinds of knowledge during
pre-training. However, encoding knowledge implicitly in the model parameters
has two fundamental drawbacks. First, the knowledge is neither editable nor
scalable once the model is trained, which is especially problematic in that
knowledge is consistently evolving. Second, it lacks interpretability and
prevents humans from understanding which knowledge PLM requires for a certain
problem. In this paper, we introduce PlugLM, a pre-training model with
differentiable plug-in memory(DPM). The key intuition is to decouple the
knowledge storage from model parameters with an editable and scalable key-value
memory and leverage knowledge in an explainable manner by knowledge retrieval
in the DPM. To justify this design choice, we conduct evaluations in three
settings including: (1) domain adaptation. PlugLM obtains 3.95 F1 improvements
across four domains on average without any in-domain pre-training. (2)
knowledge update. PlugLM could absorb new knowledge in a training-free way
after pre-training is done. (3) in-task knowledge learning. PlugLM could be
further improved by incorporating training samples into DPM with knowledge
prompting.
Related papers
- Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models [51.20499954955646]
Large language models (LLMs) acquire vast amounts of knowledge from extensive text corpora during the pretraining phase.
In later stages such as fine-tuning and inference, the model may encounter knowledge not covered in the initial training.
We propose a two-stage fine-tuning strategy to improve the model's overall test accuracy and knowledge retention.
arXiv Detail & Related papers (2024-10-08T08:35:16Z) - Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws [51.68385617116854]
Scaling laws describe the relationship between the size of language models and their capabilities.
We focus on factual knowledge represented as domains, such as (USA, capital, Washington D.C.) from a Wikipedia page.
A 7B model can store 14B bits of knowledge, surpassing the English Wikipedia and textbooks combined.
arXiv Detail & Related papers (2024-04-08T11:11:31Z) - An Emulator for Fine-Tuning Large Language Models using Small Language
Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales.
We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training.
Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z) - Plug-and-Play Knowledge Injection for Pre-trained Language Models [116.37916535076478]
Injecting external knowledge can improve the performance of pre-trained language models (PLMs) on various downstream NLP tasks.
Massive retraining is required to deploy new knowledge injection methods or knowledge bases for downstream tasks.
We study how to improve the flexibility and efficiency of knowledge injection by reusing existing downstream models.
arXiv Detail & Related papers (2023-05-28T10:58:00Z) - Can LMs Learn New Entities from Descriptions? Challenges in Propagating
Injected Knowledge [72.63368052592004]
We study LMs' abilities to make inferences based on injected facts (or propagate those facts)
We find that existing methods for updating knowledge show little propagation of injected knowledge.
Yet, prepending entity definitions in an LM's context improves performance across all settings.
arXiv Detail & Related papers (2023-05-02T17:59:46Z) - KMIR: A Benchmark for Evaluating Knowledge Memorization, Identification
and Reasoning Abilities of Language Models [28.82149012250609]
We propose a benchmark, named Knowledge Memorization, Identification, and Reasoning test (KMIR)
KMIR covers 3 types of knowledge, including general knowledge, domain-specific knowledge, and commonsense, and provides 184,348 well-designed questions.
Preliminary experiments with various representative pre-training language models on KMIR reveal many interesting phenomenons.
arXiv Detail & Related papers (2022-02-28T03:52:57Z) - DKPLM: Decomposable Knowledge-enhanced Pre-trained Language Model for
Natural Language Understanding [19.478288026844893]
Knowledge-Enhanced Pre-trained Language Models (KEPLMs) are pre-trained models with relation triples injecting from knowledge graphs to improve language understanding abilities.
Previous studies integrate models with knowledge encoders for representing knowledge retrieved from knowledge graphs.
We propose a novel KEPLM named DKPLM that Decomposes Knowledge injection process of the Pre-trained Language Models in pre-training, fine-tuning and inference stages.
arXiv Detail & Related papers (2021-12-02T08:19:42Z) - Knowledge-Aware Language Model Pretraining [29.56904859722379]
We incorporate knowledge-awareness in language model pretraining without changing the transformer architecture.
We observe improved language modeling accuracy, factual correctness in LAMA knowledge probing tasks, and semantics in the hidden representations through edge probing.
Our knowledge-aware language model (KALM) can serve as a drop-in replacement for GPT-2 models.
arXiv Detail & Related papers (2020-06-29T06:09:59Z) - REALM: Retrieval-Augmented Language Model Pre-Training [37.3178586179607]
We augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia.
For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner.
We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA)
arXiv Detail & Related papers (2020-02-10T18:40:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.