Enhancing Language Models with Plug-and-Play Large-Scale Commonsense
- URL: http://arxiv.org/abs/2109.02572v1
- Date: Mon, 6 Sep 2021 16:16:10 GMT
- Title: Enhancing Language Models with Plug-and-Play Large-Scale Commonsense
- Authors: Wanyun Cui, Xingran Chen
- Abstract summary: We study how to enhance language models (LMs) with textual commonsense knowledge.
We propose a plug-and-play method for large-scale commonsense integration without pre-training.
- Score: 2.1248439796866228
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study how to enhance language models (LMs) with textual commonsense
knowledge. Previous work (e.g., KnowBERT) has focused on the integrating entity
knowledge from knowledge graphs. In order to introduce the external entity
embeddings, they learn to jointly represent the original sentences and external
knowledge by pre-training on a large scale corpus. However, when switching to
textual commonsense, unlike the light entity embeddings, the encoding of
commonsense descriptions is heavy. Therefore, the pre-training for learning to
jointly represent the target sentence and external commonsense descriptions is
unaffordable. On the other hand, since pre-trained LMs for representing the
target sentences alone are readily available, is it feasible to introduce
commonsense knowledge in downstream tasks by fine-tuning them only? In this
paper, we propose a plug-and-play method for large-scale commonsense
integration without pre-training. Our method is inspired by the observation
that in the regular fine-tuning for downstream tasks where no external
knowledge was introduced, the variation in the parameters of the language model
was minor. Our method starts from a pre-trained LM that represents the target
sentences only (e.g., BERT). We think that the pre-training for joint
representation learning can be avoided, if the joint representation reduces the
impact of parameters on the starting LM. Previous methods such as KnowBERT
proposed complex modifications to the vanilla LM to introduce external
knowledge. Our model (Cook-Transformer, COmmOnsense Knowledge-enhanced
Transformer), on the other hand, hardly changes the vanilla LM except adding a
knowledge token in each Transformer layer. In a variety of experiments,
COOK-Transformer-based BERT/RoBERTa improve their effect without any
pre-training.
Related papers
- Decouple knowledge from parameters for plug-and-play language modeling [77.5601135412186]
We introduce PlugLM, a pre-training model with differentiable plug-in memory(DPM)
The key intuition is to decouple the knowledge storage from model parameters with an editable and scalable key-value memory.
PlugLM obtains 3.95 F1 improvements across four domains on average without any in-domain pre-training.
arXiv Detail & Related papers (2023-05-19T10:01:55Z) - Can LMs Learn New Entities from Descriptions? Challenges in Propagating
Injected Knowledge [72.63368052592004]
We study LMs' abilities to make inferences based on injected facts (or propagate those facts)
We find that existing methods for updating knowledge show little propagation of injected knowledge.
Yet, prepending entity definitions in an LM's context improves performance across all settings.
arXiv Detail & Related papers (2023-05-02T17:59:46Z) - Empowering Language Models with Knowledge Graph Reasoning for Question
Answering [117.79170629640525]
We propose knOwledge REasOning empowered Language Model (OREO-LM)
OREO-LM consists of a novel Knowledge Interaction Layer that can be flexibly plugged into existing Transformer-based LMs.
We show significant performance gain, achieving state-of-art results in the Closed-Book setting.
arXiv Detail & Related papers (2022-11-15T18:26:26Z) - Understanding Knowledge Integration in Language Models with Graph
Convolutions [28.306949176011763]
knowledge integration (KI) methods aim to incorporate external knowledge into pretrained language models (LMs)
This paper revisits the KI process in these models with an information-theoretic view and shows that KI can be interpreted using a graph convolution operation.
We analyze two well-known knowledge-enhanced LMs: ERNIE and K-Adapter, and find that only a small amount of factual knowledge is integrated in them.
arXiv Detail & Related papers (2022-02-02T11:23:36Z) - Kformer: Knowledge Injection in Transformer Feed-Forward Layers [107.71576133833148]
We propose a novel knowledge fusion model, namely Kformer, which incorporates external knowledge through the feed-forward layer in Transformer.
We empirically find that simply injecting knowledge into FFN can facilitate the pre-trained language model's ability and facilitate current knowledge fusion methods.
arXiv Detail & Related papers (2022-01-15T03:00:27Z) - K-XLNet: A General Method for Combining Explicit Knowledge with Language
Model Pretraining [5.178964604577459]
We focus on improving model pretraining by leveraging explicit knowledge.
To be specific, we first match knowledge facts from knowledge graph (KG) and then add a knowledge injunction layer to transformer directly.
The experimental results show that solely by adding external knowledge to transformer can improve the learning performance on many NLP tasks.
arXiv Detail & Related papers (2021-03-25T06:14:18Z) - Knowledge-Aware Language Model Pretraining [29.56904859722379]
We incorporate knowledge-awareness in language model pretraining without changing the transformer architecture.
We observe improved language modeling accuracy, factual correctness in LAMA knowledge probing tasks, and semantics in the hidden representations through edge probing.
Our knowledge-aware language model (KALM) can serve as a drop-in replacement for GPT-2 models.
arXiv Detail & Related papers (2020-06-29T06:09:59Z) - Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason
Over Implicit Knowledge [96.92252296244233]
Large pre-trained language models (LMs) acquire some reasoning capacity, but this ability is difficult to control.
We show that LMs can be trained to reliably perform systematic reasoning combining both implicit, pre-trained knowledge and explicit natural language statements.
Our work paves a path towards open-domain systems that constantly improve by interacting with users who can instantly correct a model by adding simple natural language statements.
arXiv Detail & Related papers (2020-06-11T17:02:20Z) - Common Sense or World Knowledge? Investigating Adapter-Based Knowledge
Injection into Pretrained Transformers [54.417299589288184]
We investigate models for complementing the distributional knowledge of BERT with conceptual knowledge from ConceptNet and its corresponding Open Mind Common Sense (OMCS) corpus.
Our adapter-based models substantially outperform BERT on inference tasks that require the type of conceptual knowledge explicitly present in ConceptNet and OMCS.
arXiv Detail & Related papers (2020-05-24T15:49:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.