Ontology-Free General-Domain Knowledge Graph-to-Text Generation Dataset Synthesis using Large Language Model
- URL: http://arxiv.org/abs/2409.07088v1
- Date: Wed, 11 Sep 2024 08:16:20 GMT
- Title: Ontology-Free General-Domain Knowledge Graph-to-Text Generation Dataset Synthesis using Large Language Model
- Authors: Daehee Kim, Deokhyung Kang, Sangwon Ryu, Gary Geunbae Lee,
- Abstract summary: Graph-to-Text (G2T) generation involves verbalizing structured graphs into natural language.
The scarcity of high-quality, general-domain G2T generation datasets restricts progress in the general-domain G2T generation research.
We introduce Wikipedia Ontology-Free Graph-text dataset (WikiOFGraph), a new large-scale G2T dataset generated using a novel method.
- Score: 4.474834288759608
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge Graph-to-Text (G2T) generation involves verbalizing structured knowledge graphs into natural language text. Recent advancements in Pretrained Language Models (PLMs) have improved G2T performance, but their effectiveness depends on datasets with precise graph-text alignment. However, the scarcity of high-quality, general-domain G2T generation datasets restricts progress in the general-domain G2T generation research. To address this issue, we introduce Wikipedia Ontology-Free Graph-text dataset (WikiOFGraph), a new large-scale G2T dataset generated using a novel method that leverages Large Language Model (LLM) and Data-QuestEval. Our new dataset, which contains 5.85M general-domain graph-text pairs, offers high graph-text consistency without relying on external ontologies. Experimental results demonstrate that PLM fine-tuned on WikiOFGraph outperforms those trained on other datasets across various evaluation metrics. Our method proves to be a scalable and effective solution for generating high-quality G2T data, significantly advancing the field of G2T generation.
Related papers
- A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior.
Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks.
GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z) - Parameter-Efficient Tuning Large Language Models for Graph Representation Learning [62.26278815157628]
We introduce Graph-aware.
Efficient Fine-Tuning - GPEFT, a novel approach for efficient graph representation learning.
We use a graph neural network (GNN) to encode structural information from neighboring nodes into a graph prompt.
We validate our approach through comprehensive experiments conducted on 8 different text-rich graphs, observing an average improvement of 2% in hit@1 and Mean Reciprocal Rank (MRR) in link prediction evaluations.
arXiv Detail & Related papers (2024-04-28T18:36:59Z) - LLaGA: Large Language and Graph Assistant [73.71990472543027]
Large Language and Graph Assistant (LLaGA) is an innovative model to handle the complexities of graph-structured data.
LLaGA excels in versatility, generalizability and interpretability, allowing it to perform consistently well across different datasets and tasks.
Our experiments show that LLaGA delivers outstanding performance across four datasets and three tasks using one single model.
arXiv Detail & Related papers (2024-02-13T02:03:26Z) - SimTeG: A Frustratingly Simple Approach Improves Textual Graph Learning [131.04781590452308]
We present SimTeG, a frustratingly Simple approach for Textual Graph learning.
We first perform supervised parameter-efficient fine-tuning (PEFT) on a pre-trained LM on the downstream task.
We then generate node embeddings using the last hidden states of finetuned LM.
arXiv Detail & Related papers (2023-08-03T07:00:04Z) - A Robust Stacking Framework for Training Deep Graph Models with
Multifaceted Node Features [61.92791503017341]
Graph Neural Networks (GNNs) with numerical node features and graph structure as inputs have demonstrated superior performance on various supervised learning tasks with graph data.
The best models for such data types in most standard supervised learning settings with IID (non-graph) data are not easily incorporated into a GNN.
Here we propose a robust stacking framework that fuses graph-aware propagation with arbitrary models intended for IID data.
arXiv Detail & Related papers (2022-06-16T22:46:33Z) - EventNarrative: A large-scale Event-centric Dataset for Knowledge
Graph-to-Text Generation [8.216976747904726]
EventNarrative consists of approximately 230,000 graphs and their corresponding natural language text, 6 times larger than the current largest parallel dataset.
Our aim is two-fold: help break new ground in event-centric research where data is lacking, and to give researchers a well-defined, large-scale dataset.
arXiv Detail & Related papers (2021-10-30T15:39:20Z) - WikiGraphs: A Wikipedia Text - Knowledge Graph Paired Dataset [37.22405455503238]
Existing graph-text paired datasets typically contain small graphs and short text (1 or few sentences)
Our new dataset WikiGraphs is collected by pairing each Wikipedia article with a subgraph from the Freebase knowledge graph.
Both the graphs and the text data are of significantly larger scale compared to prior graph-text paired datasets.
arXiv Detail & Related papers (2021-07-20T15:18:30Z) - Stage-wise Fine-tuning for Graph-to-Text Generation [25.379346921398326]
Graph-to-text generation has benefited from pre-trained language models (PLMs) in achieving better performance than structured graph encoders.
We propose a structured graph-to-text model with a two-step fine-tuning mechanism which first fine-tunes model on Wikipedia before adapting to the graph-to-text generation.
arXiv Detail & Related papers (2021-05-17T17:15:29Z) - CycleGT: Unsupervised Graph-to-Text and Text-to-Graph Generation via
Cycle Training [63.11444020743543]
Deep learning models for graph-to-text (G2T) and text-to-graph (T2G) conversion suffer from scarce training data.
We present CycleGT, an unsupervised training method that can bootstrap from non-parallel graph and text data, and iteratively back translate between the two forms.
arXiv Detail & Related papers (2020-06-08T15:59:00Z) - Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation
with Semantic Fidelity [3.8673630752805432]
We present DataTuner, a neural, end-to-end data-to-text generation system.
We take a two-stage generation-reranking approach, combining a fine-tuned language model with a semantic fidelity.
We show that DataTuner achieves state of the art results on the automated metrics across four major D2T datasets.
arXiv Detail & Related papers (2020-04-08T11:16:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.