Revisiting Challenges in Data-to-Text Generation with Fact Grounding
- URL: http://arxiv.org/abs/2001.03830v1
- Date: Sun, 12 Jan 2020 02:31:07 GMT
- Title: Revisiting Challenges in Data-to-Text Generation with Fact Grounding
- Authors: Hongmin Wang
- Abstract summary: We introduce a larger-scale dataset, RotoWire-FG (Ground-Facting), with 50% more data from the year 2017-19.
We achieve improved data fidelity over the state-of-the-art models by integrating a new form of table reconstruction.
- Score: 2.969705152497174
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data-to-text generation models face challenges in ensuring data fidelity by
referring to the correct input source. To inspire studies in this area, Wiseman
et al. (2017) introduced the RotoWire corpus on generating NBA game summaries
from the box- and line-score tables. However, limited attempts have been made
in this direction and the challenges remain. We observe a prominent bottleneck
in the corpus where only about 60% of the summary contents can be grounded to
the boxscore records. Such information deficiency tends to misguide a
conditioned language model to produce unconditioned random facts and thus leads
to factual hallucinations. In this work, we restore the information balance and
revamp this task to focus on fact-grounded data-to-text generation. We
introduce a purified and larger-scale dataset, RotoWire-FG (Fact-Grounding),
with 50% more data from the year 2017-19 and enriched input tables, hoping to
attract more research focuses in this direction. Moreover, we achieve improved
data fidelity over the state-of-the-art models by integrating a new form of
table reconstruction as an auxiliary task to boost the generation quality.
Related papers
- Towards Robustness of Text-to-Visualization Translation against Lexical and Phrasal Variability [27.16741353384065]
Text-to-vis models often rely on lexical matching between words in the questions and tokens in data schemas.
In this study, we examine the robustness of current text-to-vis models, an area that has not previously been explored.
We propose a novel framework based on Retrieval-Augmented Generation (RAG) technique, named GRED, specifically designed to address input perturbations in two variants.
arXiv Detail & Related papers (2024-04-10T16:12:50Z) - Groundedness in Retrieval-augmented Long-form Generation: An Empirical Study [61.74571814707054]
We evaluate whether every generated sentence is grounded in retrieved documents or the model's pre-training data.
Across 3 datasets and 4 model families, our findings reveal that a significant fraction of generated sentences are consistently ungrounded.
Our results show that while larger models tend to ground their outputs more effectively, a significant portion of correct answers remains compromised by hallucinations.
arXiv Detail & Related papers (2024-04-10T14:50:10Z) - NumHG: A Dataset for Number-Focused Headline Generation [28.57003500212883]
Headline generation, a key task in abstractive summarization, strives to condense a full-length article into a succinct, single line of text.
We introduce a new dataset, the NumHG, and provide over 27,000 annotated numeral-rich news articles for detailed investigation.
We evaluate five well-performing models from previous headline generation tasks using human evaluation in terms of numerical accuracy, reasonableness, and readability.
arXiv Detail & Related papers (2023-09-04T09:03:53Z) - QTSumm: Query-Focused Summarization over Tabular Data [58.62152746690958]
People primarily consult tables to conduct data analysis or answer specific questions.
We define a new query-focused table summarization task, where text generation models have to perform human-like reasoning.
We introduce a new benchmark named QTSumm for this task, which contains 7,111 human-annotated query-summary pairs over 2,934 tables.
arXiv Detail & Related papers (2023-05-23T17:43:51Z) - mFACE: Multilingual Summarization with Factual Consistency Evaluation [79.60172087719356]
Abstractive summarization has enjoyed renewed interest in recent years, thanks to pre-trained language models and the availability of large-scale datasets.
Despite promising results, current models still suffer from generating factually inconsistent summaries.
We leverage factual consistency evaluation models to improve multilingual summarization.
arXiv Detail & Related papers (2022-12-20T19:52:41Z) - Grounded Keys-to-Text Generation: Towards Factual Open-Ended Generation [92.1582872870226]
We propose a new grounded keys-to-text generation task.
The task is to generate a factual description about an entity given a set of guiding keys, and grounding passages.
Inspired by recent QA-based evaluation measures, we propose an automatic metric, MAFE, for factual correctness of generated descriptions.
arXiv Detail & Related papers (2022-12-04T23:59:41Z) - Leveraging Data Recasting to Enhance Tabular Reasoning [21.970920861791015]
Prior work has mostly relied on two data generation strategies.
The first is human annotation, which yields linguistically diverse data but is difficult to scale.
The second category for creation is synthetic generation, which is scalable and cost effective but lacks inventiveness.
arXiv Detail & Related papers (2022-11-23T00:04:57Z) - Evaluating Factuality in Generation with Dependency-level Entailment [57.5316011554622]
We propose a new formulation of entailment that decomposes it at the level of dependency arcs.
We show that our dependency arc entailment model trained on this data can identify factual inconsistencies in paraphrasing and summarization better than sentence-level methods.
arXiv Detail & Related papers (2020-10-12T06:43:10Z) - Partially-Aligned Data-to-Text Generation with Distant Supervision [69.15410325679635]
We propose a new generation task called Partially-Aligned Data-to-Text Generation (PADTG)
It is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains.
Our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.
arXiv Detail & Related papers (2020-10-03T03:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.