Related papers: Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training

Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training

URL: http://arxiv.org/abs/2010.12688v2
Date: Sat, 13 Mar 2021 18:25:01 GMT
Title: Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training
Authors: Oshin Agarwal, Heming Ge, Siamak Shakeri, Rami Al-Rfou
Abstract summary: We verbalize the entire English Wikidata KG. We show that verbalizing a comprehensive, encyclopedic KG like Wikidata can be used to integrate structured KGs and natural language corpora.
Score: 22.534866015730664
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Prior work on Data-To-Text Generation, the task of converting knowledge graph (KG) triples into natural text, focused on domain-specific benchmark datasets. In this paper, however, we verbalize the entire English Wikidata KG, and discuss the unique challenges associated with a broad, open-domain, large-scale verbalization. We further show that verbalizing a comprehensive, encyclopedic KG like Wikidata can be used to integrate structured KGs and natural language corpora. In contrast to the many architectures that have been developed to integrate these two sources, our approach converts the KG into natural text, allowing it to be seamlessly integrated into existing language models. It carries the further advantages of improved factual accuracy and reduced toxicity in the resulting language model. We evaluate this approach by augmenting the retrieval corpus in a retrieval language model and showing significant improvements on the knowledge intensive tasks of open domain QA and the LAMA knowledge probe.

Related papers

Enhancing Knowledge Graph Completion with Entity Neighborhood and Relation Context [12.539576594311127]
We propose KGC-ERC, a framework that integrates both types of context to enrich the input of generative language models and enhance their reasoning capabilities. Experiments on the Wikidata5M, Wiki27K, and FB15K-237-N datasets show that KGC-ERC outperforms or matches state-of-the-art baselines in predictive performance and scalability.
arXiv Detail & Related papers (2025-03-29T20:04:50Z)
Ontology-grounded Automatic Knowledge Graph Construction by LLM under Wikidata schema [60.42231674887294]
We propose an ontology-grounded approach to Knowledge Graph (KG) construction using Large Language Models (LLMs) on a knowledge base. We ground generation of KG with the authored ontology based on extracted relations to ensure consistency and interpretability. Our work presents a promising direction for scalable KG construction pipeline with minimal human intervention, that yields high quality and human-interpretable KGs.
arXiv Detail & Related papers (2024-12-30T13:36:05Z)
BanglaAutoKG: Automatic Bangla Knowledge Graph Construction with Semantic Neural Graph Filtering [6.05977559550463]
Knowledge Graphs (KGs) have proven essential in information processing and reasoning applications. Despite being widely used globally, Bangla is relatively underrepresented in KGs due to a lack of comprehensive datasets. We propose BanglaAutoKG, a pioneering framework that is able to automatically construct Bengali KGs from any Bangla text.
arXiv Detail & Related papers (2024-04-04T15:31:21Z)
Knowledge Graph-Augmented Language Models for Knowledge-Grounded Dialogue Generation [58.65698688443091]
We propose SUbgraph Retrieval-augmented GEneration (SURGE), a framework for generating context-relevant and knowledge-grounded dialogues with Knowledge Graphs (KGs) Our framework first retrieves the relevant subgraph from the KG, and then enforces consistency across facts by perturbing their word embeddings conditioned by the retrieved subgraph. We validate our SURGE framework on OpendialKG and KOMODIS datasets, showing that it generates high-quality dialogues that faithfully reflect the knowledge from KG.
arXiv Detail & Related papers (2023-05-30T08:36:45Z)
Deep Bidirectional Language-Knowledge Graph Pretraining [159.9645181522436]
DRAGON is a self-supervised approach to pretraining a deeply joint language-knowledge foundation model from text and KG at scale. Our model takes pairs of text segments and relevant KG subgraphs as input and bidirectionally fuses information from both modalities.
arXiv Detail & Related papers (2022-10-17T18:02:52Z)
WDV: A Broad Data Verbalisation Dataset Built from Wikidata [5.161088104035106]
Verbalising Knowledge Graph (KG) data focuses on converting interconnected triple-based claims, formed of subject, predicate, and object, into text. We propose WDV, a large KG claim verbalisation dataset built from Wikidata, with a tight coupling between triples and text. We also evaluate the quality of our verbalisations through a reusable workflow for measuring human-centred fluency and adequacy scores.
arXiv Detail & Related papers (2022-05-05T13:10:12Z)
Knowledge Graph Augmented Network Towards Multiview Representation Learning for Aspect-based Sentiment Analysis [96.53859361560505]
We propose a knowledge graph augmented network (KGAN) to incorporate external knowledge with explicitly syntactic and contextual information. KGAN captures the sentiment feature representations from multiple perspectives, i.e., context-, syntax- and knowledge-based. Experiments on three popular ABSA benchmarks demonstrate the effectiveness and robustness of our KGAN.
arXiv Detail & Related papers (2022-01-13T08:25:53Z)
KELM: Knowledge Enhanced Pre-Trained Language Representations with Message Passing on Hierarchical Relational Graphs [26.557447199727758]
We propose a novel knowledge-aware language model framework based on fine-tuning process. Our model can efficiently incorporate world knowledge from KGs into existing language models such as BERT.
arXiv Detail & Related papers (2021-09-09T12:39:17Z)
Few-shot Knowledge Graph-to-Text Generation with Pretrained Language Models [42.38563175680914]
This paper studies how to automatically generate a natural language text that describes the facts in knowledge graph (KG) Considering the few-shot setting, we leverage the excellent capacities of pretrained language models (PLMs) in language understanding and generation.
arXiv Detail & Related papers (2021-06-03T06:48:00Z)
JAKET: Joint Pre-training of Knowledge Graph and Language Understanding [73.43768772121985]
We propose a novel joint pre-training framework, JAKET, to model both the knowledge graph and language. The knowledge module and language module provide essential information to mutually assist each other. Our design enables the pre-trained model to easily adapt to unseen knowledge graphs in new domains.
arXiv Detail & Related papers (2020-10-02T05:53:36Z)
CoLAKE: Contextualized Language and Knowledge Embedding [81.90416952762803]
We propose the Contextualized Language and Knowledge Embedding (CoLAKE) CoLAKE jointly learns contextualized representation for both language and knowledge with the extended objective. We conduct experiments on knowledge-driven tasks, knowledge probing tasks, and language understanding tasks.
arXiv Detail & Related papers (2020-10-01T11:39:32Z)
Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning [73.0598186896953]
We present two self-supervised tasks learning over raw text with the guidance from knowledge graphs. Building upon entity-level masked language models, our first contribution is an entity masking scheme. In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training.
arXiv Detail & Related papers (2020-04-29T14:22:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.