KgPLM: Knowledge-guided Language Model Pre-training via Generative and
Discriminative Learning
- URL: http://arxiv.org/abs/2012.03551v1
- Date: Mon, 7 Dec 2020 09:39:25 GMT
- Title: KgPLM: Knowledge-guided Language Model Pre-training via Generative and
Discriminative Learning
- Authors: Bin He, Xin Jiang, Jinghui Xiao, Qun Liu
- Abstract summary: We present a language model pre-training framework guided by factual knowledge completion and verification.
Experimental results on LAMA, a set of zero-shot cloze-style question answering tasks, show that our model contains richer factual knowledge than the conventional pre-trained language models.
- Score: 45.067001062192844
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies on pre-trained language models have demonstrated their ability
to capture factual knowledge and applications in knowledge-aware downstream
tasks. In this work, we present a language model pre-training framework guided
by factual knowledge completion and verification, and use the generative and
discriminative approaches cooperatively to learn the model. Particularly, we
investigate two learning schemes, named two-tower scheme and pipeline scheme,
in training the generator and discriminator with shared parameter. Experimental
results on LAMA, a set of zero-shot cloze-style question answering tasks, show
that our model contains richer factual knowledge than the conventional
pre-trained language models. Furthermore, when fine-tuned and evaluated on the
MRQA shared tasks which consists of several machine reading comprehension
datasets, our model achieves the state-of-the-art performance, and gains large
improvements on NewsQA (+1.26 F1) and TriviaQA (+1.56 F1) over RoBERTa.
Related papers
- NOWJ1@ALQAC 2023: Enhancing Legal Task Performance with Classic
Statistical Models and Pre-trained Language Models [4.329463429688995]
This paper describes the NOWJ1 Team's approach for the Automated Legal Question Answering Competition (ALQAC) 2023.
For the document retrieval task, we implement a pre-processing step to overcome input limitations and apply learning-to-rank methods to consolidate features from various models.
We incorporate state-of-the-art models to develop distinct systems for each sub-task, utilizing both classic statistical models and pre-trained Language Models.
arXiv Detail & Related papers (2023-09-16T18:32:15Z) - Commonsense Knowledge Transfer for Pre-trained Language Models [83.01121484432801]
We introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model.
It first exploits general texts to form queries for extracting commonsense knowledge from the neural commonsense knowledge model.
It then refines the language model with two self-supervised objectives: commonsense mask infilling and commonsense relation prediction.
arXiv Detail & Related papers (2023-06-04T15:44:51Z) - Zero-shot Visual Question Answering with Language Model Feedback [83.65140324876536]
We propose a language model guided captioning approach, LAMOC, for knowledge-based visual question answering (VQA)
Our approach employs the generated captions by a captioning model as the context of an answer prediction model, which is a Pre-trained Language model (PLM)
arXiv Detail & Related papers (2023-05-26T15:04:20Z) - The KITMUS Test: Evaluating Knowledge Integration from Multiple Sources
in Natural Language Understanding Systems [87.3207729953778]
We evaluate state-of-the-art coreference resolution models on our dataset.
Several models struggle to reason on-the-fly over knowledge observed both at pretrain time and at inference time.
Still, even the best performing models seem to have difficulties with reliably integrating knowledge presented only at inference time.
arXiv Detail & Related papers (2022-12-15T23:26:54Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - ANNA: Enhanced Language Representation for Question Answering [5.713808202873983]
We show how approaches affect performance individually and that the approaches are jointly considered in pre-training models.
We propose an extended pre-training task, and a new neighbor-aware mechanism that attends neighboring tokens more to capture the richness of context for pre-training language modeling.
Our best model achieves new state-of-the-art results of 95.7% F1 and 90.6% EM on SQuAD 1.1 and also outperforms existing pre-trained language models such as RoBERTa, ALBERT, ELECTRA, and XLNet.
arXiv Detail & Related papers (2022-03-28T05:26:52Z) - Interpreting Language Models Through Knowledge Graph Extraction [42.97929497661778]
We compare BERT-based language models through snapshots of acquired knowledge at sequential stages of the training process.
We present a methodology to unveil a knowledge acquisition timeline by generating knowledge graph extracts from cloze "fill-in-the-blank" statements.
We extend this analysis to a comparison of pretrained variations of BERT models (DistilBERT, BERT-base, RoBERTa)
arXiv Detail & Related papers (2021-11-16T15:18:01Z) - Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks [133.93803565077337]
retrieval-augmented generation models combine pre-trained parametric and non-parametric memory for language generation.
We show that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.
arXiv Detail & Related papers (2020-05-22T21:34:34Z) - REALM: Retrieval-Augmented Language Model Pre-Training [37.3178586179607]
We augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia.
For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner.
We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA)
arXiv Detail & Related papers (2020-02-10T18:40:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.