Knowledge-Aware Language Model Pretraining
- URL: http://arxiv.org/abs/2007.00655v2
- Date: Thu, 4 Feb 2021 06:54:39 GMT
- Title: Knowledge-Aware Language Model Pretraining
- Authors: Corby Rosset, Chenyan Xiong, Minh Phan, Xia Song, Paul Bennett,
Saurabh Tiwary
- Abstract summary: We incorporate knowledge-awareness in language model pretraining without changing the transformer architecture.
We observe improved language modeling accuracy, factual correctness in LAMA knowledge probing tasks, and semantics in the hidden representations through edge probing.
Our knowledge-aware language model (KALM) can serve as a drop-in replacement for GPT-2 models.
- Score: 29.56904859722379
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How much knowledge do pretrained language models hold? Recent research
observed that pretrained transformers are adept at modeling semantics but it is
unclear to what degree they grasp human knowledge, or how to ensure they do so.
In this paper we incorporate knowledge-awareness in language model pretraining
without changing the transformer architecture, inserting explicit knowledge
layers, or adding external storage of semantic information. Rather, we simply
signal the existence of entities to the input of the transformer in
pretraining, with an entity-extended tokenizer; and at the output, with an
additional entity prediction task. Our experiments show that solely by adding
these entity signals in pretraining, significantly more knowledge is packed
into the transformer parameters: we observe improved language modeling
accuracy, factual correctness in LAMA knowledge probing tasks, and semantics in
the hidden representations through edge probing.We also show that our
knowledge-aware language model (KALM) can serve as a drop-in replacement for
GPT-2 models, significantly improving downstream tasks like zero-shot
question-answering with no task-related training.
Related papers
- Decouple knowledge from parameters for plug-and-play language modeling [77.5601135412186]
We introduce PlugLM, a pre-training model with differentiable plug-in memory(DPM)
The key intuition is to decouple the knowledge storage from model parameters with an editable and scalable key-value memory.
PlugLM obtains 3.95 F1 improvements across four domains on average without any in-domain pre-training.
arXiv Detail & Related papers (2023-05-19T10:01:55Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Language Model-Based Paired Variational Autoencoders for Robotic Language Learning [18.851256771007748]
Similar to human infants, artificial agents can learn language while interacting with their environment.
We present a neural model that bidirectionally binds robot actions and their language descriptions in a simple object manipulation scenario.
Next, we introduce PVAE-BERT, which equips the model with a pretrained large-scale language model.
arXiv Detail & Related papers (2022-01-17T10:05:26Z) - Kformer: Knowledge Injection in Transformer Feed-Forward Layers [107.71576133833148]
We propose a novel knowledge fusion model, namely Kformer, which incorporates external knowledge through the feed-forward layer in Transformer.
We empirically find that simply injecting knowledge into FFN can facilitate the pre-trained language model's ability and facilitate current knowledge fusion methods.
arXiv Detail & Related papers (2022-01-15T03:00:27Z) - DKPLM: Decomposable Knowledge-enhanced Pre-trained Language Model for
Natural Language Understanding [19.478288026844893]
Knowledge-Enhanced Pre-trained Language Models (KEPLMs) are pre-trained models with relation triples injecting from knowledge graphs to improve language understanding abilities.
Previous studies integrate models with knowledge encoders for representing knowledge retrieved from knowledge graphs.
We propose a novel KEPLM named DKPLM that Decomposes Knowledge injection process of the Pre-trained Language Models in pre-training, fine-tuning and inference stages.
arXiv Detail & Related papers (2021-12-02T08:19:42Z) - HYDRA -- Hyper Dependency Representation Attentions [4.697611383288171]
We propose lightweight pretrained linguistic self-attention heads to inject knowledge into transformer models without pretraining them again.
Our approach is a balanced paradigm between leaving the models to learn unsupervised and forcing them to conform to linguistic knowledge rigidly.
We empirically verify our framework on benchmark datasets to show the contribution of linguistic knowledge to a transformer model.
arXiv Detail & Related papers (2021-09-11T19:17:34Z) - Editing Factual Knowledge in Language Models [51.947280241185]
We present KnowledgeEditor, a method that can be used to edit this knowledge.
Besides being computationally efficient, KnowledgeEditor does not require any modifications in LM pre-training.
We show KnowledgeEditor's efficacy with two popular architectures and knowledge-intensive tasks.
arXiv Detail & Related papers (2021-04-16T15:24:42Z) - K-XLNet: A General Method for Combining Explicit Knowledge with Language
Model Pretraining [5.178964604577459]
We focus on improving model pretraining by leveraging explicit knowledge.
To be specific, we first match knowledge facts from knowledge graph (KG) and then add a knowledge injunction layer to transformer directly.
The experimental results show that solely by adding external knowledge to transformer can improve the learning performance on many NLP tasks.
arXiv Detail & Related papers (2021-03-25T06:14:18Z) - Modifying Memories in Transformer Models [71.48657481835767]
We propose a new task of emphexplicitly modifying specific factual knowledge in Transformer models.
This task is useful in many scenarios, such as updating stale knowledge, protecting privacy, and eliminating unintended biases stored in the models.
arXiv Detail & Related papers (2020-12-01T09:39:13Z) - Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason
Over Implicit Knowledge [96.92252296244233]
Large pre-trained language models (LMs) acquire some reasoning capacity, but this ability is difficult to control.
We show that LMs can be trained to reliably perform systematic reasoning combining both implicit, pre-trained knowledge and explicit natural language statements.
Our work paves a path towards open-domain systems that constantly improve by interacting with users who can instantly correct a model by adding simple natural language statements.
arXiv Detail & Related papers (2020-06-11T17:02:20Z) - REALM: Retrieval-Augmented Language Model Pre-Training [37.3178586179607]
We augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia.
For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner.
We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA)
arXiv Detail & Related papers (2020-02-10T18:40:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.