Partially-Aligned Data-to-Text Generation with Distant Supervision
- URL: http://arxiv.org/abs/2010.01268v1
- Date: Sat, 3 Oct 2020 03:18:52 GMT
- Title: Partially-Aligned Data-to-Text Generation with Distant Supervision
- Authors: Zihao Fu, Bei Shi, Wai Lam, Lidong Bing, Zhiyuan Liu
- Abstract summary: We propose a new generation task called Partially-Aligned Data-to-Text Generation (PADTG)
It is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains.
Our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.
- Score: 69.15410325679635
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Data-to-Text task aims to generate human-readable text for describing
some given structured data enabling more interpretability. However, the typical
generation task is confined to a few particular domains since it requires
well-aligned data which is difficult and expensive to obtain. Using
partially-aligned data is an alternative way of solving the dataset scarcity
problem. This kind of data is much easier to obtain since it can be produced
automatically. However, using this kind of data induces the over-generation
problem posing difficulties for existing models, which tends to add unrelated
excerpts during the generation procedure. In order to effectively utilize
automatically annotated partially-aligned datasets, we extend the traditional
generation task to a refined task called Partially-Aligned Data-to-Text
Generation (PADTG) which is more practical since it utilizes automatically
annotated data for training and thus considerably expands the application
domains. To tackle this new task, we propose a novel distant supervision
generation framework. It firstly estimates the input data's supportiveness for
each target word with an estimator and then applies a supportiveness adaptor
and a rebalanced beam search to harness the over-generation problem in the
training and generation phases respectively. We also contribute a
partially-aligned dataset (The data and source code of this paper can be
obtained from https://github.com/fuzihaofzh/distant_supervision_nlg by sampling
sentences from Wikipedia and automatically extracting corresponding KB triples
for each sentence from Wikidata. The experimental results show that our
framework outperforms all baseline models as well as verify the feasibility of
utilizing partially-aligned data.
Related papers
- Faithful Low-Resource Data-to-Text Generation through Cycle Training [14.375070014155817]
Methods to generate text from structured data have advanced significantly in recent years.
Cycle training uses two models which are inverses of each other.
We show that cycle training achieves nearly the same performance as fully supervised approaches.
arXiv Detail & Related papers (2023-05-24T06:44:42Z) - ASDOT: Any-Shot Data-to-Text Generation with Pretrained Language Models [82.63962107729994]
Any-Shot Data-to-Text (ASDOT) is a new approach flexibly applicable to diverse settings.
It consists of two steps, data disambiguation and sentence fusion.
Experimental results show that ASDOT consistently achieves significant improvement over baselines.
arXiv Detail & Related papers (2022-10-09T19:17:43Z) - Curriculum-Based Self-Training Makes Better Few-Shot Learners for
Data-to-Text Generation [56.98033565736974]
We propose Curriculum-Based Self-Training (CBST) to leverage unlabeled data in a rearranged order determined by the difficulty of text generation.
Our method can outperform fine-tuning and task-adaptive pre-training methods, and achieve state-of-the-art performance in the few-shot setting of data-to-text generation.
arXiv Detail & Related papers (2022-06-06T16:11:58Z) - Fine-Grained Scene Graph Generation with Data Transfer [127.17675443137064]
Scene graph generation (SGG) aims to extract (subject, predicate, object) triplets in images.
Recent works have made a steady progress on SGG, and provide useful tools for high-level vision and language understanding.
We propose a novel Internal and External Data Transfer (IETrans) method, which can be applied in a play-and-plug fashion and expanded to large SGG with 1,807 predicate classes.
arXiv Detail & Related papers (2022-03-22T12:26:56Z) - Robust Document Representations using Latent Topics and Metadata [17.306088038339336]
We propose a novel approach to fine-tuning a pre-trained neural language model for document classification problems.
We generate document representations that capture both text and metadata artifacts in a task manner.
Our solution also incorporates metadata explicitly rather than just augmenting them with text.
arXiv Detail & Related papers (2020-10-23T21:52:38Z) - Evaluating Factuality in Generation with Dependency-level Entailment [57.5316011554622]
We propose a new formulation of entailment that decomposes it at the level of dependency arcs.
We show that our dependency arc entailment model trained on this data can identify factual inconsistencies in paraphrasing and summarization better than sentence-level methods.
arXiv Detail & Related papers (2020-10-12T06:43:10Z) - KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation [100.79870384880333]
We propose a knowledge-grounded pre-training (KGPT) to generate knowledge-enriched text.
We adopt three settings, namely fully-supervised, zero-shot, few-shot to evaluate its effectiveness.
Under zero-shot setting, our model achieves over 30 ROUGE-L on WebNLG while all other baselines fail.
arXiv Detail & Related papers (2020-10-05T19:59:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.