WikiGraphs: A Wikipedia Text - Knowledge Graph Paired Dataset
- URL: http://arxiv.org/abs/2107.09556v1
- Date: Tue, 20 Jul 2021 15:18:30 GMT
- Title: WikiGraphs: A Wikipedia Text - Knowledge Graph Paired Dataset
- Authors: Luyu Wang, Yujia Li, Ozlem Aslan, Oriol Vinyals
- Abstract summary: Existing graph-text paired datasets typically contain small graphs and short text (1 or few sentences)
Our new dataset WikiGraphs is collected by pairing each Wikipedia article with a subgraph from the Freebase knowledge graph.
Both the graphs and the text data are of significantly larger scale compared to prior graph-text paired datasets.
- Score: 37.22405455503238
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We present a new dataset of Wikipedia articles each paired with a knowledge
graph, to facilitate the research in conditional text generation, graph
generation and graph representation learning. Existing graph-text paired
datasets typically contain small graphs and short text (1 or few sentences),
thus limiting the capabilities of the models that can be learned on the data.
Our new dataset WikiGraphs is collected by pairing each Wikipedia article from
the established WikiText-103 benchmark (Merity et al., 2016) with a subgraph
from the Freebase knowledge graph (Bollacker et al., 2008). This makes it easy
to benchmark against other state-of-the-art text generative models that are
capable of generating long paragraphs of coherent text. Both the graphs and the
text data are of significantly larger scale compared to prior graph-text paired
datasets. We present baseline graph neural network and transformer model
results on our dataset for 3 tasks: graph -> text generation, graph -> text
retrieval and text -> graph retrieval. We show that better conditioning on the
graph provides gains in generation and retrieval quality but there is still
large room for improvement.
Related papers
- Ontology-Free General-Domain Knowledge Graph-to-Text Generation Dataset Synthesis using Large Language Model [4.474834288759608]
Graph-to-Text (G2T) generation involves verbalizing structured graphs into natural language.
The scarcity of high-quality, general-domain G2T generation datasets restricts progress in the general-domain G2T generation research.
We introduce Wikipedia Ontology-Free Graph-text dataset (WikiOFGraph), a new large-scale G2T dataset generated using a novel method.
arXiv Detail & Related papers (2024-09-11T08:16:20Z) - Node Level Graph Autoencoder: Unified Pretraining for Textual Graph Learning [45.70767623846523]
We propose a novel unified unsupervised learning autoencoder framework, named Node Level Graph AutoEncoder (NodeGAE)
We employ language models as the backbone of the autoencoder, with pretraining on text reconstruction.
Our method maintains simplicity in the training process and demonstrates generalizability across diverse textual graphs and downstream tasks.
arXiv Detail & Related papers (2024-08-09T14:57:53Z) - TAGLAS: An atlas of text-attributed graph datasets in the era of large graph and language models [25.16561980988102]
TAGLAS is an atlas of text-attributed graph (TAG) datasets and benchmarks.
We collect and integrate more than 23 TAG datasets with domains ranging from citation graphs to molecule graphs.
We provide a standardized, efficient, and simplified way to load all datasets and tasks.
arXiv Detail & Related papers (2024-06-20T19:11:35Z) - G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering [61.93058781222079]
We develop a flexible question-answering framework targeting real-world textual graphs.
We introduce the first retrieval-augmented generation (RAG) approach for general textual graphs.
G-Retriever performs RAG over a graph by formulating this task as a Prize-Collecting Steiner Tree optimization problem.
arXiv Detail & Related papers (2024-02-12T13:13:04Z) - GraphextQA: A Benchmark for Evaluating Graph-Enhanced Large Language
Models [33.56759621666477]
We present a benchmark dataset for evaluating the integration of graph knowledge into language models.
The proposed dataset is designed to evaluate graph-language models' ability to understand graphs and make use of it for answer generation.
We perform experiments with language-only models and the proposed graph-language model to validate the usefulness of the paired graphs and to demonstrate the difficulty of the task.
arXiv Detail & Related papers (2023-10-12T16:46:58Z) - Improving Graph-Based Text Representations with Character and Word Level
N-grams [30.699644290131044]
We propose a new word-character text graph that combines word and character n-gram nodes together with document nodes.
We also propose two new graph-based neural models, WCTextGCN and WCTextGAT, for modeling our proposed text graph.
arXiv Detail & Related papers (2022-10-12T08:07:54Z) - Explanation Graph Generation via Pre-trained Language Models: An
Empirical Study with Contrastive Learning [84.35102534158621]
We study pre-trained language models that generate explanation graphs in an end-to-end manner.
We propose simple yet effective ways of graph perturbations via node and edge edit operations.
Our methods lead to significant improvements in both structural and semantic accuracy of explanation graphs.
arXiv Detail & Related papers (2022-04-11T00:58:27Z) - Hierarchical Heterogeneous Graph Representation Learning for Short Text
Classification [60.233529926965836]
We propose a new method called SHINE, which is based on graph neural network (GNN) for short text classification.
First, we model the short text dataset as a hierarchical heterogeneous graph consisting of word-level component graphs.
Then, we dynamically learn a short document graph that facilitates effective label propagation among similar short texts.
arXiv Detail & Related papers (2021-10-30T05:33:05Z) - GraphFormers: GNN-nested Transformers for Representation Learning on
Textual Graph [53.70520466556453]
We propose GraphFormers, where layerwise GNN components are nested alongside the transformer blocks of language models.
With the proposed architecture, the text encoding and the graph aggregation are fused into an iterative workflow.
In addition, a progressive learning strategy is introduced, where the model is successively trained on manipulated data and original data to reinforce its capability of integrating information on graph.
arXiv Detail & Related papers (2021-05-06T12:20:41Z) - Multilevel Graph Matching Networks for Deep Graph Similarity Learning [79.3213351477689]
We propose a multi-level graph matching network (MGMN) framework for computing the graph similarity between any pair of graph-structured objects.
To compensate for the lack of standard benchmark datasets, we have created and collected a set of datasets for both the graph-graph classification and graph-graph regression tasks.
Comprehensive experiments demonstrate that MGMN consistently outperforms state-of-the-art baseline models on both the graph-graph classification and graph-graph regression tasks.
arXiv Detail & Related papers (2020-07-08T19:48:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.