ReTAG: Reasoning Aware Table to Analytic Text Generation
- URL: http://arxiv.org/abs/2305.11826v2
- Date: Mon, 30 Oct 2023 03:24:37 GMT
- Title: ReTAG: Reasoning Aware Table to Analytic Text Generation
- Authors: Deepanway Ghosal and Preksha Nema and Aravindan Raghuveer
- Abstract summary: ReTAG is a table and reasoning aware model that uses vector-quantization to infuse different types of analytical reasoning into the output.
We extend (and open source 35.6K analytical, 55.9k descriptive instances) the ToTTo, InfoTabs datasets with the reasoning categories used in each reference sentences.
- Score: 12.603569641254417
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The task of table summarization involves generating text that both succinctly
and accurately represents the table or a specific set of highlighted cells
within a table. While significant progress has been made in table to text
generation techniques, models still mostly generate descriptive summaries,
which reiterates the information contained within the table in sentences.
Through analysis of popular table to text benchmarks (ToTTo (Parikh et al.,
2020 and InfoTabs (Gupta et al., 2020) we observe that in order to generate the
ideal summary, multiple types of reasoning is needed coupled with access to
knowledge beyond the scope of the table. To address this gap, we propose ReTAG,
a table and reasoning aware model that uses vector-quantization to infuse
different types of analytical reasoning into the output. ReTAG achieves 2.2%,
2.9% improvement on the PARENT metric in the relevant slice of ToTTo and
InfoTabs for the table to text generation task over state of the art baselines.
Through human evaluation, we observe that output from ReTAG is upto 12% more
faithful and analytical compared to a strong table-aware model. To the best of
our knowledge, ReTAG is the first model that can controllably use multiple
reasoning methods within a structure-aware sequence to sequence model to
surpass state of the art performance in multiple table to text tasks. We extend
(and open source 35.6K analytical, 55.9k descriptive instances) the ToTTo,
InfoTabs datasets with the reasoning categories used in each reference
sentences.
Related papers
- ArxivDIGESTables: Synthesizing Scientific Literature into Tables using Language Models [58.34560740973768]
We introduce a framework that leverages language models (LMs) to generate literature review tables.
A new dataset of 2,228 literature review tables extracted from ArXiv papers synthesize a total of 7,542 research papers.
We evaluate LMs' abilities to reconstruct reference tables, finding this task benefits from additional context.
arXiv Detail & Related papers (2024-10-25T18:31:50Z) - Is This a Bad Table? A Closer Look at the Evaluation of Table Generation from Text [21.699434525769586]
Existing measures for table quality evaluation fail to capture the overall semantics of the tables.
We propose TabEval, a novel table evaluation strategy that captures table semantics.
To validate our approach, we curate a dataset comprising of text descriptions for 1,250 diverse Wikipedia tables.
arXiv Detail & Related papers (2024-06-21T02:18:03Z) - TDeLTA: A Light-weight and Robust Table Detection Method based on
Learning Text Arrangement [34.73880086005418]
We propose a novel, light-weighted and robust Table Detection method based on Learning Text Arrangement, namely TDeLTA.
To locate the tables precisely, we design a text-classification task, classifying the text blocks into 4 categories according to their semantic roles in the tables.
Compared to several state-of-the-art methods, TDeLTA achieves competitive results with only 3.1M model parameters on the large-scale public datasets.
arXiv Detail & Related papers (2023-12-18T09:18:43Z) - HeLM: Highlighted Evidence augmented Language Model for Enhanced Table-to-Text Generation [7.69801337810352]
We conduct parameter-efficient fine-tuning on the LLaMA2 model.
Our approach involves injecting reasoning information into the input by emphasizing table-specific row data.
On both the FetaQA and QTSumm datasets, our approach achieved state-of-the-art results.
arXiv Detail & Related papers (2023-11-15T12:02:52Z) - QTSumm: Query-Focused Summarization over Tabular Data [58.62152746690958]
People primarily consult tables to conduct data analysis or answer specific questions.
We define a new query-focused table summarization task, where text generation models have to perform human-like reasoning.
We introduce a new benchmark named QTSumm for this task, which contains 7,111 human-annotated query-summary pairs over 2,934 tables.
arXiv Detail & Related papers (2023-05-23T17:43:51Z) - Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text
Documents via Semantic-Oriented Hierarchical Graphs [79.0426838808629]
We propose TAT-DQA, i.e. to answer the question over a visually-rich table-text document.
Specifically, we propose a novel Doc2SoarGraph framework with enhanced discrete reasoning capability.
We conduct extensive experiments on TAT-DQA dataset, and the results show that our proposed framework outperforms the best baseline model by 17.73% and 16.91% in terms of Exact Match (EM) and F1 score respectively on the test set.
arXiv Detail & Related papers (2023-05-03T07:30:32Z) - GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing.
We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar.
To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z) - Towards Faithful Neural Table-to-Text Generation with Content-Matching
Constraints [63.84063384518667]
We propose a novel Transformer-based generation framework to achieve the goal.
Core techniques in our method to enforce faithfulness include a new table-text optimal-transport matching loss.
To evaluate faithfulness, we propose a new automatic metric specialized to the table-to-text generation problem.
arXiv Detail & Related papers (2020-05-03T02:54:26Z) - ToTTo: A Controlled Table-To-Text Generation Dataset [61.83159452483026]
ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples.
We introduce a dataset construction process where annotators directly revise existing candidate sentences from Wikipedia.
While usually fluent, existing methods often hallucinate phrases that are not supported by the table.
arXiv Detail & Related papers (2020-04-29T17:53:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.