Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced
Language Model Pre-training
- URL: http://arxiv.org/abs/2010.12688v2
- Date: Sat, 13 Mar 2021 18:25:01 GMT
- Title: Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced
Language Model Pre-training
- Authors: Oshin Agarwal, Heming Ge, Siamak Shakeri, Rami Al-Rfou
- Abstract summary: We verbalize the entire English Wikidata KG.
We show that verbalizing a comprehensive, encyclopedic KG like Wikidata can be used to integrate structured KGs and natural language corpora.
- Score: 22.534866015730664
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prior work on Data-To-Text Generation, the task of converting knowledge graph
(KG) triples into natural text, focused on domain-specific benchmark datasets.
In this paper, however, we verbalize the entire English Wikidata KG, and
discuss the unique challenges associated with a broad, open-domain, large-scale
verbalization. We further show that verbalizing a comprehensive, encyclopedic
KG like Wikidata can be used to integrate structured KGs and natural language
corpora. In contrast to the many architectures that have been developed to
integrate these two sources, our approach converts the KG into natural text,
allowing it to be seamlessly integrated into existing language models. It
carries the further advantages of improved factual accuracy and reduced
toxicity in the resulting language model. We evaluate this approach by
augmenting the retrieval corpus in a retrieval language model and showing
significant improvements on the knowledge intensive tasks of open domain QA and
the LAMA knowledge probe.
Related papers
- Ontology-grounded Automatic Knowledge Graph Construction by LLM under Wikidata schema [60.42231674887294]
We propose an ontology-grounded approach to Knowledge Graph (KG) construction using Large Language Models (LLMs) on a knowledge base.
We ground generation of KG with the authored ontology based on extracted relations to ensure consistency and interpretability.
Our work presents a promising direction for scalable KG construction pipeline with minimal human intervention, that yields high quality and human-interpretable KGs.
arXiv Detail & Related papers (2024-12-30T13:36:05Z) - Knowledge Graph-Augmented Language Models for Knowledge-Grounded
Dialogue Generation [58.65698688443091]
We propose SUbgraph Retrieval-augmented GEneration (SURGE), a framework for generating context-relevant and knowledge-grounded dialogues with Knowledge Graphs (KGs)
Our framework first retrieves the relevant subgraph from the KG, and then enforces consistency across facts by perturbing their word embeddings conditioned by the retrieved subgraph.
We validate our SURGE framework on OpendialKG and KOMODIS datasets, showing that it generates high-quality dialogues that faithfully reflect the knowledge from KG.
arXiv Detail & Related papers (2023-05-30T08:36:45Z) - Deep Bidirectional Language-Knowledge Graph Pretraining [159.9645181522436]
DRAGON is a self-supervised approach to pretraining a deeply joint language-knowledge foundation model from text and KG at scale.
Our model takes pairs of text segments and relevant KG subgraphs as input and bidirectionally fuses information from both modalities.
arXiv Detail & Related papers (2022-10-17T18:02:52Z) - WDV: A Broad Data Verbalisation Dataset Built from Wikidata [5.161088104035106]
Verbalising Knowledge Graph (KG) data focuses on converting interconnected triple-based claims, formed of subject, predicate, and object, into text.
We propose WDV, a large KG claim verbalisation dataset built from Wikidata, with a tight coupling between triples and text.
We also evaluate the quality of our verbalisations through a reusable workflow for measuring human-centred fluency and adequacy scores.
arXiv Detail & Related papers (2022-05-05T13:10:12Z) - Knowledge Graph Augmented Network Towards Multiview Representation
Learning for Aspect-based Sentiment Analysis [96.53859361560505]
We propose a knowledge graph augmented network (KGAN) to incorporate external knowledge with explicitly syntactic and contextual information.
KGAN captures the sentiment feature representations from multiple perspectives, i.e., context-, syntax- and knowledge-based.
Experiments on three popular ABSA benchmarks demonstrate the effectiveness and robustness of our KGAN.
arXiv Detail & Related papers (2022-01-13T08:25:53Z) - KELM: Knowledge Enhanced Pre-Trained Language Representations with
Message Passing on Hierarchical Relational Graphs [26.557447199727758]
We propose a novel knowledge-aware language model framework based on fine-tuning process.
Our model can efficiently incorporate world knowledge from KGs into existing language models such as BERT.
arXiv Detail & Related papers (2021-09-09T12:39:17Z) - Few-shot Knowledge Graph-to-Text Generation with Pretrained Language
Models [42.38563175680914]
This paper studies how to automatically generate a natural language text that describes the facts in knowledge graph (KG)
Considering the few-shot setting, we leverage the excellent capacities of pretrained language models (PLMs) in language understanding and generation.
arXiv Detail & Related papers (2021-06-03T06:48:00Z) - JAKET: Joint Pre-training of Knowledge Graph and Language Understanding [73.43768772121985]
We propose a novel joint pre-training framework, JAKET, to model both the knowledge graph and language.
The knowledge module and language module provide essential information to mutually assist each other.
Our design enables the pre-trained model to easily adapt to unseen knowledge graphs in new domains.
arXiv Detail & Related papers (2020-10-02T05:53:36Z) - Exploiting Structured Knowledge in Text via Graph-Guided Representation
Learning [73.0598186896953]
We present two self-supervised tasks learning over raw text with the guidance from knowledge graphs.
Building upon entity-level masked language models, our first contribution is an entity masking scheme.
In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training.
arXiv Detail & Related papers (2020-04-29T14:22:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.