CycleGT: Unsupervised Graph-to-Text and Text-to-Graph Generation via
Cycle Training
- URL: http://arxiv.org/abs/2006.04702v3
- Date: Wed, 9 Dec 2020 19:29:27 GMT
- Title: CycleGT: Unsupervised Graph-to-Text and Text-to-Graph Generation via
Cycle Training
- Authors: Qipeng Guo, Zhijing Jin, Xipeng Qiu, Weinan Zhang, David Wipf, Zheng
Zhang
- Abstract summary: Deep learning models for graph-to-text (G2T) and text-to-graph (T2G) conversion suffer from scarce training data.
We present CycleGT, an unsupervised training method that can bootstrap from non-parallel graph and text data, and iteratively back translate between the two forms.
- Score: 63.11444020743543
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Two important tasks at the intersection of knowledge graphs and natural
language processing are graph-to-text (G2T) and text-to-graph (T2G) conversion.
Due to the difficulty and high cost of data collection, the supervised data
available in the two fields are usually on the magnitude of tens of thousands,
for example, 18K in the WebNLG~2017 dataset after preprocessing, which is far
fewer than the millions of data for other tasks such as machine translation.
Consequently, deep learning models for G2T and T2G suffer largely from scarce
training data. We present CycleGT, an unsupervised training method that can
bootstrap from fully non-parallel graph and text data, and iteratively back
translate between the two forms. Experiments on WebNLG datasets show that our
unsupervised model trained on the same number of data achieves performance on
par with several fully supervised models. Further experiments on the
non-parallel GenWiki dataset verify that our method performs the best among
unsupervised baselines. This validates our framework as an effective approach
to overcome the data scarcity problem in the fields of G2T and T2G. Our code is
available at https://github.com/QipengGuo/CycleGT.
Related papers
- Ontology-Free General-Domain Knowledge Graph-to-Text Generation Dataset Synthesis using Large Language Model [4.474834288759608]
Graph-to-Text (G2T) generation involves verbalizing structured graphs into natural language.
The scarcity of high-quality, general-domain G2T generation datasets restricts progress in the general-domain G2T generation research.
We introduce Wikipedia Ontology-Free Graph-text dataset (WikiOFGraph), a new large-scale G2T dataset generated using a novel method.
arXiv Detail & Related papers (2024-09-11T08:16:20Z) - Efficient Grammatical Error Correction Via Multi-Task Training and
Optimized Training Schedule [55.08778142798106]
We propose auxiliary tasks that exploit the alignment between the original and corrected sentences.
We formulate each task as a sequence-to-sequence problem and perform multi-task training.
We find that the order of datasets used for training and even individual instances within a dataset may have important effects on the final performance.
arXiv Detail & Related papers (2023-11-20T14:50:12Z) - SimTeG: A Frustratingly Simple Approach Improves Textual Graph Learning [131.04781590452308]
We present SimTeG, a frustratingly Simple approach for Textual Graph learning.
We first perform supervised parameter-efficient fine-tuning (PEFT) on a pre-trained LM on the downstream task.
We then generate node embeddings using the last hidden states of finetuned LM.
arXiv Detail & Related papers (2023-08-03T07:00:04Z) - Faithful Low-Resource Data-to-Text Generation through Cycle Training [14.375070014155817]
Methods to generate text from structured data have advanced significantly in recent years.
Cycle training uses two models which are inverses of each other.
We show that cycle training achieves nearly the same performance as fully supervised approaches.
arXiv Detail & Related papers (2023-05-24T06:44:42Z) - INFINITY: A Simple Yet Effective Unsupervised Framework for Graph-Text
Mutual Conversion [43.70416280548082]
Graph-to-text (G2T) generation and text-to-graph (T2G) triple extraction are essential tasks for constructing and applying knowledge graphs.
Existing unsupervised approaches turn out to be suitable candidates for jointly learning the two tasks due to their avoidance of using graph-text parallel data.
We propose INFINITY, a simple yet effective unsupervised approach that does not require external annotation tools or additional parallel information.
arXiv Detail & Related papers (2022-09-22T03:12:43Z) - Fine-Grained Scene Graph Generation with Data Transfer [127.17675443137064]
Scene graph generation (SGG) aims to extract (subject, predicate, object) triplets in images.
Recent works have made a steady progress on SGG, and provide useful tools for high-level vision and language understanding.
We propose a novel Internal and External Data Transfer (IETrans) method, which can be applied in a play-and-plug fashion and expanded to large SGG with 1,807 predicate classes.
arXiv Detail & Related papers (2022-03-22T12:26:56Z) - A multi-task semi-supervised framework for Text2Graph & Graph2Text [2.2344764434954256]
We jointly learn graph extraction from text and text generation from graphs.
Our approach surpasses unsupervised state-of-the-art results in text-to-graph and graph-to-text.
The resulting model can be easily trained in any new domain with non-parallel data.
arXiv Detail & Related papers (2022-02-12T11:02:17Z) - EventNarrative: A large-scale Event-centric Dataset for Knowledge
Graph-to-Text Generation [8.216976747904726]
EventNarrative consists of approximately 230,000 graphs and their corresponding natural language text, 6 times larger than the current largest parallel dataset.
Our aim is two-fold: help break new ground in event-centric research where data is lacking, and to give researchers a well-defined, large-scale dataset.
arXiv Detail & Related papers (2021-10-30T15:39:20Z) - KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation [100.79870384880333]
We propose a knowledge-grounded pre-training (KGPT) to generate knowledge-enriched text.
We adopt three settings, namely fully-supervised, zero-shot, few-shot to evaluate its effectiveness.
Under zero-shot setting, our model achieves over 30 ROUGE-L on WebNLG while all other baselines fail.
arXiv Detail & Related papers (2020-10-05T19:59:05Z) - Partially-Aligned Data-to-Text Generation with Distant Supervision [69.15410325679635]
We propose a new generation task called Partially-Aligned Data-to-Text Generation (PADTG)
It is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains.
Our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.
arXiv Detail & Related papers (2020-10-03T03:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.