Variational Template Machine for Data-to-Text Generation
- URL: http://arxiv.org/abs/2002.01127v2
- Date: Thu, 13 Feb 2020 09:50:56 GMT
- Title: Variational Template Machine for Data-to-Text Generation
- Authors: Rong Ye, Wenxian Shi, Hao Zhou, Zhongyu Wei, Lei Li
- Abstract summary: We claim that an open set of templates is crucial for enriching the phrase constructions and realizing varied generations.
This paper explores the problem of automatically learning reusable "templates" from paired and non-paired data.
We propose the variational template machine (VTM), a novel method to generate text descriptions from data tables.
- Score: 37.03488881357614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How to generate descriptions from structured data organized in tables?
Existing approaches using neural encoder-decoder models often suffer from
lacking diversity. We claim that an open set of templates is crucial for
enriching the phrase constructions and realizing varied generations. Learning
such templates is prohibitive since it often requires a large paired <table,
description> corpus, which is seldom available. This paper explores the problem
of automatically learning reusable "templates" from paired and non-paired data.
We propose the variational template machine (VTM), a novel method to generate
text descriptions from data tables. Our contributions include: a) we carefully
devise a specific model architecture and losses to explicitly disentangle text
template and semantic content information, in the latent spaces, and b)we
utilize both small parallel data and large raw text without aligned tables to
enrich the template learning. Experiments on datasets from a variety of
different domains show that VTM is able to generate more diversely while
keeping a good fluency and quality.
Related papers
- "What is the value of {templates}?" Rethinking Document Information Extraction Datasets for LLMs [19.07429412219697]
We present K2Q, a collection of five datasets converted from KIE to a prompt-response format using a plethora of bespoke templates.
We empirically compare the performance of seven baseline generative models on K2Q with zero-shot prompting.
We find that creating diverse and intricate KIE questions enhances the performance and robustness of VRDU models.
arXiv Detail & Related papers (2024-10-20T19:42:30Z) - Detection and Measurement of Syntactic Templates in Generated Text [58.111650675717414]
We offer an analysis of syntactic features to characterize general repetition in models.
We find that models tend to produce templated text in downstream tasks at a higher rate than what is found in human-reference texts.
arXiv Detail & Related papers (2024-06-28T19:34:23Z) - PixT3: Pixel-based Table-To-Text Generation [66.96636025277536]
We present PixT3, a multimodal table-to-text model that overcomes the challenges of linearization and input size limitations.
Experiments on the ToTTo and Logic2Text benchmarks show that PixT3 is competitive and superior to generators that operate solely on text.
arXiv Detail & Related papers (2023-11-16T11:32:47Z) - Modelling the semantics of text in complex document layouts using graph
transformer networks [0.0]
We propose a model that approximates the human reading pattern of a document and outputs a unique semantic representation for every text span.
We base our architecture on a graph representation of the structured text, and we demonstrate that not only can we retrieve semantically similar information across documents but also that the embedding space we generate captures useful semantic information.
arXiv Detail & Related papers (2022-02-18T11:49:06Z) - Generating Wikipedia Article Sections from Diverse Data Sources [57.23574577984244]
We benchmark several training and decoding strategies on WikiTableT.
Our qualitative analysis shows that the best approaches can generate fluent and high quality texts but they sometimes struggle with coherence.
arXiv Detail & Related papers (2020-12-29T19:35:34Z) - A Graph Representation of Semi-structured Data for Web Question
Answering [96.46484690047491]
We propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations.
Our method improves F1 score by 3.90 points over the state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-14T04:01:54Z) - Partially-Aligned Data-to-Text Generation with Distant Supervision [69.15410325679635]
We propose a new generation task called Partially-Aligned Data-to-Text Generation (PADTG)
It is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains.
Our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.
arXiv Detail & Related papers (2020-10-03T03:18:52Z) - Extraction of Templates from Phrases Using Sequence Binary Decision
Diagrams [3.867363075280544]
This paper presents an unsupervised approach for extracting templates from only tagged text by using a novel relaxed variant of the Sequence Binary Decision Diagram (SeqBDD)
The main contribution of this paper is a relaxed form of the SeqBDD construction algorithm that enables it to form general representations from a small amount of data.
Experiments show that the method is capable of high-quality extraction on tasks based on verb+preposition templates from corpora and phrasal templates from short messages from social media.
arXiv Detail & Related papers (2020-01-28T05:30:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.