Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced
Language Model Pre-training
- URL: http://arxiv.org/abs/2010.12688v2
- Date: Sat, 13 Mar 2021 18:25:01 GMT
- Title: Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced
Language Model Pre-training
- Authors: Oshin Agarwal, Heming Ge, Siamak Shakeri, Rami Al-Rfou
- Abstract summary: We verbalize the entire English Wikidata KG.
We show that verbalizing a comprehensive, encyclopedic KG like Wikidata can be used to integrate structured KGs and natural language corpora.
- Score: 22.534866015730664
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prior work on Data-To-Text Generation, the task of converting knowledge graph
(KG) triples into natural text, focused on domain-specific benchmark datasets.
In this paper, however, we verbalize the entire English Wikidata KG, and
discuss the unique challenges associated with a broad, open-domain, large-scale
verbalization. We further show that verbalizing a comprehensive, encyclopedic
KG like Wikidata can be used to integrate structured KGs and natural language
corpora. In contrast to the many architectures that have been developed to
integrate these two sources, our approach converts the KG into natural text,
allowing it to be seamlessly integrated into existing language models. It
carries the further advantages of improved factual accuracy and reduced
toxicity in the resulting language model. We evaluate this approach by
augmenting the retrieval corpus in a retrieval language model and showing
significant improvements on the knowledge intensive tasks of open domain QA and
the LAMA knowledge probe.
Related papers
- BanglaAutoKG: Automatic Bangla Knowledge Graph Construction with Semantic Neural Graph Filtering [6.05977559550463]
Knowledge Graphs (KGs) have proven essential in information processing and reasoning applications.
Despite being widely used globally, Bangla is relatively underrepresented in KGs due to a lack of comprehensive datasets.
We propose BanglaAutoKG, a pioneering framework that is able to automatically construct Bengali KGs from any Bangla text.
arXiv Detail & Related papers (2024-04-04T15:31:21Z) - Knowledge Graph-Augmented Language Models for Knowledge-Grounded
Dialogue Generation [58.65698688443091]
We propose SUbgraph Retrieval-augmented GEneration (SURGE), a framework for generating context-relevant and knowledge-grounded dialogues with Knowledge Graphs (KGs)
Our framework first retrieves the relevant subgraph from the KG, and then enforces consistency across facts by perturbing their word embeddings conditioned by the retrieved subgraph.
We validate our SURGE framework on OpendialKG and KOMODIS datasets, showing that it generates high-quality dialogues that faithfully reflect the knowledge from KG.
arXiv Detail & Related papers (2023-05-30T08:36:45Z) - Deep Bidirectional Language-Knowledge Graph Pretraining [159.9645181522436]
DRAGON is a self-supervised approach to pretraining a deeply joint language-knowledge foundation model from text and KG at scale.
Our model takes pairs of text segments and relevant KG subgraphs as input and bidirectionally fuses information from both modalities.
arXiv Detail & Related papers (2022-10-17T18:02:52Z) - WDV: A Broad Data Verbalisation Dataset Built from Wikidata [5.161088104035106]
Verbalising Knowledge Graph (KG) data focuses on converting interconnected triple-based claims, formed of subject, predicate, and object, into text.
We propose WDV, a large KG claim verbalisation dataset built from Wikidata, with a tight coupling between triples and text.
We also evaluate the quality of our verbalisations through a reusable workflow for measuring human-centred fluency and adequacy scores.
arXiv Detail & Related papers (2022-05-05T13:10:12Z) - Knowledge Graph Augmented Network Towards Multiview Representation
Learning for Aspect-based Sentiment Analysis [96.53859361560505]
We propose a knowledge graph augmented network (KGAN) to incorporate external knowledge with explicitly syntactic and contextual information.
KGAN captures the sentiment feature representations from multiple perspectives, i.e., context-, syntax- and knowledge-based.
Experiments on three popular ABSA benchmarks demonstrate the effectiveness and robustness of our KGAN.
arXiv Detail & Related papers (2022-01-13T08:25:53Z) - KELM: Knowledge Enhanced Pre-Trained Language Representations with
Message Passing on Hierarchical Relational Graphs [26.557447199727758]
We propose a novel knowledge-aware language model framework based on fine-tuning process.
Our model can efficiently incorporate world knowledge from KGs into existing language models such as BERT.
arXiv Detail & Related papers (2021-09-09T12:39:17Z) - Few-shot Knowledge Graph-to-Text Generation with Pretrained Language
Models [42.38563175680914]
This paper studies how to automatically generate a natural language text that describes the facts in knowledge graph (KG)
Considering the few-shot setting, we leverage the excellent capacities of pretrained language models (PLMs) in language understanding and generation.
arXiv Detail & Related papers (2021-06-03T06:48:00Z) - JAKET: Joint Pre-training of Knowledge Graph and Language Understanding [73.43768772121985]
We propose a novel joint pre-training framework, JAKET, to model both the knowledge graph and language.
The knowledge module and language module provide essential information to mutually assist each other.
Our design enables the pre-trained model to easily adapt to unseen knowledge graphs in new domains.
arXiv Detail & Related papers (2020-10-02T05:53:36Z) - CoLAKE: Contextualized Language and Knowledge Embedding [81.90416952762803]
We propose the Contextualized Language and Knowledge Embedding (CoLAKE)
CoLAKE jointly learns contextualized representation for both language and knowledge with the extended objective.
We conduct experiments on knowledge-driven tasks, knowledge probing tasks, and language understanding tasks.
arXiv Detail & Related papers (2020-10-01T11:39:32Z) - Exploiting Structured Knowledge in Text via Graph-Guided Representation
Learning [73.0598186896953]
We present two self-supervised tasks learning over raw text with the guidance from knowledge graphs.
Building upon entity-level masked language models, our first contribution is an entity masking scheme.
In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training.
arXiv Detail & Related papers (2020-04-29T14:22:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.