EventNarrative: A large-scale Event-centric Dataset for Knowledge
Graph-to-Text Generation
- URL: http://arxiv.org/abs/2111.00276v1
- Date: Sat, 30 Oct 2021 15:39:20 GMT
- Title: EventNarrative: A large-scale Event-centric Dataset for Knowledge
Graph-to-Text Generation
- Authors: Anthony Colas, Ali Sadeghian, Yue Wang, Daisy Zhe Wang
- Abstract summary: EventNarrative consists of approximately 230,000 graphs and their corresponding natural language text, 6 times larger than the current largest parallel dataset.
Our aim is two-fold: help break new ground in event-centric research where data is lacking, and to give researchers a well-defined, large-scale dataset.
- Score: 8.216976747904726
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce EventNarrative, a knowledge graph-to-text dataset from publicly
available open-world knowledge graphs. Given the recent advances in
event-driven Information Extraction (IE), and that prior research on
graph-to-text only focused on entity-driven KGs, this paper focuses on
event-centric data. However, our data generation system can still be adapted to
other other types of KG data. Existing large-scale datasets in the
graph-to-text area are non-parallel, meaning there is a large disconnect
between the KGs and text. The datasets that have a paired KG and text, are
small scale and manually generated or generated without a rich ontology, making
the corresponding graphs sparse. Furthermore, these datasets contain many
unlinked entities between their KG and text pairs. EventNarrative consists of
approximately 230,000 graphs and their corresponding natural language text, 6
times larger than the current largest parallel dataset. It makes use of a rich
ontology, all of the KGs entities are linked to the text, and our manual
annotations confirm a high data quality. Our aim is two-fold: help break new
ground in event-centric research where data is lacking, and to give researchers
a well-defined, large-scale dataset in order to better evaluate existing and
future knowledge graph-to-text models. We also evaluate two types of baseline
on EventNarrative: a graph-to-text specific model and two state-of-the-art
language models, which previous work has shown to be adaptable to the knowledge
graph-to-text domain.
Related papers
- Ontology-Free General-Domain Knowledge Graph-to-Text Generation Dataset Synthesis using Large Language Model [4.474834288759608]
Graph-to-Text (G2T) generation involves verbalizing structured graphs into natural language.
The scarcity of high-quality, general-domain G2T generation datasets restricts progress in the general-domain G2T generation research.
We introduce Wikipedia Ontology-Free Graph-text dataset (WikiOFGraph), a new large-scale G2T dataset generated using a novel method.
arXiv Detail & Related papers (2024-09-11T08:16:20Z) - iText2KG: Incremental Knowledge Graphs Construction Using Large Language Models [0.7165255458140439]
iText2KG is a method for incremental, topic-independent Knowledge Graph construction without post-processing.
Our method demonstrates superior performance compared to baseline methods across three scenarios.
arXiv Detail & Related papers (2024-09-05T06:49:14Z) - Bridging Local Details and Global Context in Text-Attributed Graphs [62.522550655068336]
GraphBridge is a framework that bridges local and global perspectives by leveraging contextual textual information.
Our method achieves state-of-theart performance, while our graph-aware token reduction module significantly enhances efficiency and solves scalability issues.
arXiv Detail & Related papers (2024-06-18T13:35:25Z) - Hierarchical Compression of Text-Rich Graphs via Large Language Models [63.75293588479027]
Text-rich graphs are prevalent in data mining contexts like e-commerce and academic graphs.
This paper introduces Hierarchical Compression'' (HiCom), a novel method to align the capabilities of LLMs with the structure of text-rich graphs.
HiCom can outperform both GNNs and LLM backbones for node classification on e-commerce and citation graphs.
arXiv Detail & Related papers (2024-06-13T07:24:46Z) - Using Large Language Models for Zero-Shot Natural Language Generation
from Knowledge Graphs [4.56877715768796]
We show that ChatGPT achieves near state-of-the-art performance on some measures of the WebNLG 2020 challenge.
We also show that there is a significant connection between what the LLM already knows about the data it is parsing and the quality of the output text.
arXiv Detail & Related papers (2023-07-14T12:45:03Z) - ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings [20.25180279903009]
We propose Contrastive Graph-Text pretraining (ConGraT) for jointly learning separate representations of texts and nodes in a text-attributed graph (TAG)
Our method trains a language model (LM) and a graph neural network (GNN) to align their representations in a common latent space using a batch-wise contrastive learning objective inspired by CLIP.
Experiments demonstrate that ConGraT outperforms baselines on various downstream tasks, including node and text category classification, link prediction, and language modeling.
arXiv Detail & Related papers (2023-05-23T17:53:30Z) - Deep Bidirectional Language-Knowledge Graph Pretraining [159.9645181522436]
DRAGON is a self-supervised approach to pretraining a deeply joint language-knowledge foundation model from text and KG at scale.
Our model takes pairs of text segments and relevant KG subgraphs as input and bidirectionally fuses information from both modalities.
arXiv Detail & Related papers (2022-10-17T18:02:52Z) - GraphFormers: GNN-nested Transformers for Representation Learning on
Textual Graph [53.70520466556453]
We propose GraphFormers, where layerwise GNN components are nested alongside the transformer blocks of language models.
With the proposed architecture, the text encoding and the graph aggregation are fused into an iterative workflow.
In addition, a progressive learning strategy is introduced, where the model is successively trained on manipulated data and original data to reinforce its capability of integrating information on graph.
arXiv Detail & Related papers (2021-05-06T12:20:41Z) - CycleGT: Unsupervised Graph-to-Text and Text-to-Graph Generation via
Cycle Training [63.11444020743543]
Deep learning models for graph-to-text (G2T) and text-to-graph (T2G) conversion suffer from scarce training data.
We present CycleGT, an unsupervised training method that can bootstrap from non-parallel graph and text data, and iteratively back translate between the two forms.
arXiv Detail & Related papers (2020-06-08T15:59:00Z) - ENT-DESC: Entity Description Generation by Exploring Knowledge Graph [53.03778194567752]
In practice, the input knowledge could be more than enough, since the output description may only cover the most significant knowledge.
We introduce a large-scale and challenging dataset to facilitate the study of such a practical scenario in KG-to-text.
We propose a multi-graph structure that is able to represent the original graph information more comprehensively.
arXiv Detail & Related papers (2020-04-30T14:16:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.