Common Sense or World Knowledge? Investigating Adapter-Based Knowledge
Injection into Pretrained Transformers
- URL: http://arxiv.org/abs/2005.11787v2
- Date: Sun, 11 Oct 2020 11:31:03 GMT
- Title: Common Sense or World Knowledge? Investigating Adapter-Based Knowledge
Injection into Pretrained Transformers
- Authors: Anne Lauscher and Olga Majewska and Leonardo F. R. Ribeiro and Iryna
Gurevych and Nikolai Rozanov and Goran Glava\v{s}
- Abstract summary: We investigate models for complementing the distributional knowledge of BERT with conceptual knowledge from ConceptNet and its corresponding Open Mind Common Sense (OMCS) corpus.
Our adapter-based models substantially outperform BERT on inference tasks that require the type of conceptual knowledge explicitly present in ConceptNet and OMCS.
- Score: 54.417299589288184
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Following the major success of neural language models (LMs) such as BERT or
GPT-2 on a variety of language understanding tasks, recent work focused on
injecting (structured) knowledge from external resources into these models.
While on the one hand, joint pretraining (i.e., training from scratch, adding
objectives based on external knowledge to the primary LM objective) may be
prohibitively computationally expensive, post-hoc fine-tuning on external
knowledge, on the other hand, may lead to the catastrophic forgetting of
distributional knowledge. In this work, we investigate models for complementing
the distributional knowledge of BERT with conceptual knowledge from ConceptNet
and its corresponding Open Mind Common Sense (OMCS) corpus, respectively, using
adapter training. While overall results on the GLUE benchmark paint an
inconclusive picture, a deeper analysis reveals that our adapter-based models
substantially outperform BERT (up to 15-20 performance points) on inference
tasks that require the type of conceptual knowledge explicitly present in
ConceptNet and OMCS. All code and experiments are open sourced under:
https://github.com/wluper/retrograph .
Related papers
- CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning [45.62134354858683]
CANDLE is a framework that iteratively performs conceptualization and instantiation over commonsense knowledge bases.
By applying CANDLE to ATOMIC, we construct a comprehensive knowledge base comprising six million conceptualizations and instantiated commonsense knowledge triples.
arXiv Detail & Related papers (2024-01-14T13:24:30Z) - Commonsense Knowledge Transfer for Pre-trained Language Models [83.01121484432801]
We introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model.
It first exploits general texts to form queries for extracting commonsense knowledge from the neural commonsense knowledge model.
It then refines the language model with two self-supervised objectives: commonsense mask infilling and commonsense relation prediction.
arXiv Detail & Related papers (2023-06-04T15:44:51Z) - Knowledge Rumination for Pre-trained Language Models [77.55888291165462]
We propose a new paradigm dubbed Knowledge Rumination to help the pre-trained language model utilize related latent knowledge without retrieving it from the external corpus.
We apply the proposed knowledge rumination to various language models, including RoBERTa, DeBERTa, and GPT-3.
arXiv Detail & Related papers (2023-05-15T15:47:09Z) - Kformer: Knowledge Injection in Transformer Feed-Forward Layers [107.71576133833148]
We propose a novel knowledge fusion model, namely Kformer, which incorporates external knowledge through the feed-forward layer in Transformer.
We empirically find that simply injecting knowledge into FFN can facilitate the pre-trained language model's ability and facilitate current knowledge fusion methods.
arXiv Detail & Related papers (2022-01-15T03:00:27Z) - KAT: A Knowledge Augmented Transformer for Vision-and-Language [56.716531169609915]
We propose a novel model - Knowledge Augmented Transformer (KAT) - which achieves a strong state-of-the-art result on the open-domain multimodal task of OK-VQA.
Our approach integrates implicit and explicit knowledge in an end to end encoder-decoder architecture, while still jointly reasoning over both knowledge sources during answer generation.
An additional benefit of explicit knowledge integration is seen in improved interpretability of model predictions in our analysis.
arXiv Detail & Related papers (2021-12-16T04:37:10Z) - K-XLNet: A General Method for Combining Explicit Knowledge with Language
Model Pretraining [5.178964604577459]
We focus on improving model pretraining by leveraging explicit knowledge.
To be specific, we first match knowledge facts from knowledge graph (KG) and then add a knowledge injunction layer to transformer directly.
The experimental results show that solely by adding external knowledge to transformer can improve the learning performance on many NLP tasks.
arXiv Detail & Related papers (2021-03-25T06:14:18Z) - Towards a Universal Continuous Knowledge Base [49.95342223987143]
We propose a method for building a continuous knowledge base that can store knowledge imported from multiple neural networks.
Experiments on text classification show promising results.
We import the knowledge from multiple models to the knowledge base, from which the fused knowledge is exported back to a single model.
arXiv Detail & Related papers (2020-12-25T12:27:44Z) - KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for Visual
Commonsense Reasoning [4.787501955202053]
In visual commonsense reasoning (VCR) task, a machine must answer correctly and then provide a rationale justifying its answer.
We propose a novel Knowledge Enhanced Visual-and-Linguistic BERT (KVL-BERT for short) model.
Besides taking visual and linguistic contents as input, external commonsense knowledge extracted from ConceptNet is integrated into the multi-layer Transformer.
arXiv Detail & Related papers (2020-12-13T08:22:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.