Learning to Reason for Text Generation from Scientific Tables
- URL: http://arxiv.org/abs/2104.08296v1
- Date: Fri, 16 Apr 2021 18:01:36 GMT
- Title: Learning to Reason for Text Generation from Scientific Tables
- Authors: Nafise Sadat Moosavi, Andreas R\"uckl\'e, Dan Roth, Iryna Gurevych
- Abstract summary: We introduce SciGen, a new challenge dataset for the task of reasoning-aware data-to-text generation.
Describing scientific tables goes beyond the surface realization of the table content and requires reasoning over table values.
We study the effectiveness of state-of-the-art data-to-text generation models on SciGen and evaluate the results using common metrics as well as human evaluation.
- Score: 100.61286775597947
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce SciGen, a new challenge dataset for the task of
reasoning-aware data-to-text generation consisting of tables from scientific
articles and their corresponding descriptions. Describing scientific tables
goes beyond the surface realization of the table content and requires reasoning
over table values. The unique properties of SciGen are that (1) tables mostly
contain numerical values, and (2) the corresponding descriptions require
arithmetic reasoning. SciGen is therefore the first dataset that assesses the
arithmetic reasoning capabilities of generation models on complex input
structures, i.e., tables from scientific articles. We study the effectiveness
of state-of-the-art data-to-text generation models on SciGen and evaluate the
results using common metrics as well as human evaluation. Our results and
analyses show that (a) while humans like to reason for describing scientific
tables, the ability of state-of-the-art models is severely limited on this
task, (b) while adding more training data improves the results, it is not the
solution for reasoning-aware text generation, and (c) one of the main
bottlenecks for this task is the lack of proper automatic evaluation metrics.
The data, code, and annotations for human evaluation will be available at
https://github.com/UKPLab/SciGen. SciGen opens new avenues for future research
in reasoning-aware text generation and evaluation.
Related papers
- ArxivDIGESTables: Synthesizing Scientific Literature into Tables using Language Models [58.34560740973768]
We introduce a framework that leverages language models (LMs) to generate literature review tables.
A new dataset of 2,228 literature review tables extracted from ArXiv papers synthesize a total of 7,542 research papers.
We evaluate LMs' abilities to reconstruct reference tables, finding this task benefits from additional context.
arXiv Detail & Related papers (2024-10-25T18:31:50Z) - How Robust are the Tabular QA Models for Scientific Tables? A Study using Customized Dataset [23.822733961152103]
"SciTabQA" is an innovative dataset to study question-answering over scientific heterogeneous data.
We benchmark three state-of-the-art Tabular QA models, and find that the best F1 score is only 0.462.
arXiv Detail & Related papers (2024-03-30T15:48:49Z) - Towards Controlled Table-to-Text Generation with Scientific Reasoning [46.87189607486007]
We present a new task for generating fluent and logical descriptions that match user preferences over scientific data, aiming to automate scientific document analysis.
We construct a new challenging dataset,SciTab, consisting of table-description pairs extracted from the scientific literature, with highlighted cells and corresponding domain-specific knowledge base.
The results showed that large models struggle to produce accurate content that aligns with user preferences. As the first of its kind, our work should motivate further research in scientific domains.
arXiv Detail & Related papers (2023-12-08T22:57:35Z) - QTSumm: Query-Focused Summarization over Tabular Data [58.62152746690958]
People primarily consult tables to conduct data analysis or answer specific questions.
We define a new query-focused table summarization task, where text generation models have to perform human-like reasoning.
We introduce a new benchmark named QTSumm for this task, which contains 7,111 human-annotated query-summary pairs over 2,934 tables.
arXiv Detail & Related papers (2023-05-23T17:43:51Z) - Leveraging Data Recasting to Enhance Tabular Reasoning [21.970920861791015]
Prior work has mostly relied on two data generation strategies.
The first is human annotation, which yields linguistically diverse data but is difficult to scale.
The second category for creation is synthetic generation, which is scalable and cost effective but lacks inventiveness.
arXiv Detail & Related papers (2022-11-23T00:04:57Z) - Sketch and Refine: Towards Faithful and Informative Table-to-Text
Generation [58.320248632121476]
We propose a novel two-stage method that combines both Autoregressive and Non-Autoregressive generations (SANA)
Our approach includes: (1) skeleton generation with an autoregressive pointer network to select key tokens from the source table; (2) edit-based non-autoregressive generation model to produce texts via iterative insertion and deletion operations.
By integrating hard constraints from the skeleton, the non-autoregressive model improves the generation's coverage over the source table and thus enhances its faithfulness.
arXiv Detail & Related papers (2021-05-31T08:18:13Z) - Logical Natural Language Generation from Open-Domain Tables [107.04385677577862]
We propose a new task where a model is tasked with generating natural language statements that can be emphlogically entailed by the facts.
To facilitate the study of the proposed logical NLG problem, we use the existing TabFact dataset citechen 2019tabfact featured with a wide range of logical/symbolic inferences.
The new task poses challenges to the existing monotonic generation frameworks due to the mismatch between sequence order and logical order.
arXiv Detail & Related papers (2020-04-22T06:03:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.