Interpreting Language Models Through Knowledge Graph Extraction
- URL: http://arxiv.org/abs/2111.08546v1
- Date: Tue, 16 Nov 2021 15:18:01 GMT
- Title: Interpreting Language Models Through Knowledge Graph Extraction
- Authors: Vinitra Swamy, Angelika Romanou, Martin Jaggi
- Abstract summary: We compare BERT-based language models through snapshots of acquired knowledge at sequential stages of the training process.
We present a methodology to unveil a knowledge acquisition timeline by generating knowledge graph extracts from cloze "fill-in-the-blank" statements.
We extend this analysis to a comparison of pretrained variations of BERT models (DistilBERT, BERT-base, RoBERTa)
- Score: 42.97929497661778
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer-based language models trained on large text corpora have enjoyed
immense popularity in the natural language processing community and are
commonly used as a starting point for downstream tasks. While these models are
undeniably useful, it is a challenge to quantify their performance beyond
traditional accuracy metrics. In this paper, we compare BERT-based language
models through snapshots of acquired knowledge at sequential stages of the
training process. Structured relationships from training corpora may be
uncovered through querying a masked language model with probing tasks. We
present a methodology to unveil a knowledge acquisition timeline by generating
knowledge graph extracts from cloze "fill-in-the-blank" statements at various
stages of RoBERTa's early training. We extend this analysis to a comparison of
pretrained variations of BERT models (DistilBERT, BERT-base, RoBERTa). This
work proposes a quantitative framework to compare language models through
knowledge graph extraction (GED, Graph2Vec) and showcases a part-of-speech
analysis (POSOR) to identify the linguistic strengths of each model variant.
Using these metrics, machine learning practitioners can compare models,
diagnose their models' behavioral strengths and weaknesses, and identify new
targeted datasets to improve model performance.
Related papers
- Generate to Understand for Representation [3.5325087487696463]
GUR is a pretraining framework that combines language modeling and contrastive learning objectives in a single training step.
GUR achieves impressive results without any labeled training data, outperforming all other pretrained baselines as a retriever at the recall benchmark in a zero-shot setting.
arXiv Detail & Related papers (2023-06-14T06:00:18Z) - Pre-Training to Learn in Context [138.0745138788142]
The ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context.
We propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability.
Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters.
arXiv Detail & Related papers (2023-05-16T03:38:06Z) - Pre-trained Language Model with Prompts for Temporal Knowledge Graph
Completion [30.50032335014021]
We propose a novel TKGC model, namely Pre-trained Language Model with Prompts for TKGC (PPT)
We convert a series of sampled quadruples into pre-trained language model inputs and convert intervals between timestamps into different prompts to make coherent sentences with implicit semantic information.
Our model can effectively incorporate information from temporal knowledge graphs into the language models.
arXiv Detail & Related papers (2023-05-13T12:53:11Z) - Pre-training Data Quality and Quantity for a Low-Resource Language: New
Corpus and BERT Models for Maltese [4.4681678689625715]
We analyse the effect of pre-training with monolingual data for a low-resource language.
We present a newly created corpus for Maltese, and determine the effect that the pre-training data size and domain have on the downstream performance.
We compare two models on the new corpus: a monolingual BERT model trained from scratch (BERTu), and a further pre-trained multilingual BERT (mBERTu)
arXiv Detail & Related papers (2022-05-21T06:44:59Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z) - InfoBERT: Improving Robustness of Language Models from An Information
Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks.
We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z) - Exploring Software Naturalness through Neural Language Models [56.1315223210742]
The Software Naturalness hypothesis argues that programming languages can be understood through the same techniques used in natural language processing.
We explore this hypothesis through the use of a pre-trained transformer-based language model to perform code analysis tasks.
arXiv Detail & Related papers (2020-06-22T21:56:14Z) - ParsBERT: Transformer-based Model for Persian Language Understanding [0.7646713951724012]
This paper proposes a monolingual BERT for the Persian language (ParsBERT)
It shows its state-of-the-art performance compared to other architectures and multilingual models.
ParsBERT obtains higher scores in all datasets, including existing ones as well as composed ones.
arXiv Detail & Related papers (2020-05-26T05:05:32Z) - What the [MASK]? Making Sense of Language-Specific BERT Models [39.54532211263058]
This paper presents the current state of the art in language-specific BERT models.
Our aim is to provide an overview of the commonalities and differences between Language-language-specific BERT models and mBERT models.
arXiv Detail & Related papers (2020-03-05T20:42:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.