ReasTAP: Injecting Table Reasoning Skills During Pre-training via
Synthetic Reasoning Examples
- URL: http://arxiv.org/abs/2210.12374v1
- Date: Sat, 22 Oct 2022 07:04:02 GMT
- Title: ReasTAP: Injecting Table Reasoning Skills During Pre-training via
Synthetic Reasoning Examples
- Authors: Yilun Zhao, Linyong Nan, Zhenting Qi, Rui Zhang, Dragomir Radev
- Abstract summary: We develop ReasTAP to show that high-level table reasoning skills can be injected into models during pre-training without a complex table-specific architecture design.
ReasTAP achieves new state-of-the-art performance on all benchmarks and delivers a significant improvement on low-resource setting.
- Score: 15.212332890570869
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reasoning over tabular data requires both table structure understanding and a
broad set of table reasoning skills. Current models with table-specific
architectures and pre-training methods perform well on understanding table
structures, but they still struggle with tasks that require various table
reasoning skills. In this work, we develop ReasTAP to show that high-level
table reasoning skills can be injected into models during pre-training without
a complex table-specific architecture design. We define 7 table reasoning
skills, such as numerical operation, temporal comparison, and conjunction. Each
reasoning skill is associated with one example generator, which synthesizes
questions over semi-structured tables according to the sampled templates. We
model the table pre-training task as a sequence generation task and pre-train
ReasTAP to generate precise answers to the synthetic examples. ReasTAP is
evaluated on four benchmarks covering three downstream tasks including: 1)
WikiSQL and WTQ for Table Question Answering; 2) TabFact for Table Fact
Verification; and 3) LogicNLG for Faithful Table-to-Text Generation.
Experimental results demonstrate that ReasTAP achieves new state-of-the-art
performance on all benchmarks and delivers a significant improvement on
low-resource setting. Our code is publicly available at
https://github.com/Yale-LILY/ReasTAP.
Related papers
- TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy [81.76462101465354]
We present a novel large vision-hugging model, TabPedia, equipped with a concept synergy mechanism.
This unified framework allows TabPedia to seamlessly integrate VTU tasks, such as table detection, table structure recognition, table querying, and table question answering.
To better evaluate the VTU task in real-world scenarios, we establish a new and comprehensive table VQA benchmark, ComTQA.
arXiv Detail & Related papers (2024-06-03T13:54:05Z) - Chain-of-Table: Evolving Tables in the Reasoning Chain for Table
Understanding [79.9461269253121]
We propose the Chain-of-Table framework, where tabular data is explicitly used in the reasoning chain as a proxy for intermediate thoughts.
Chain-of-Table achieves new state-of-the-art performance on WikiTQ, FeTaQA, and TabFact benchmarks.
arXiv Detail & Related papers (2024-01-09T07:46:26Z) - MultiTabQA: Generating Tabular Answers for Multi-Table Question
Answering [61.48881995121938]
Real-world queries are complex in nature, often over multiple tables in a relational database or web page.
Our model, MultiTabQA, not only answers questions over multiple tables, but also generalizes to generate tabular answers.
arXiv Detail & Related papers (2023-05-22T08:25:15Z) - OmniTab: Pretraining with Natural and Synthetic Data for Few-shot
Table-based Question Answering [106.73213656603453]
We develop a simple table-based QA model with minimal annotation effort.
We propose an omnivorous pretraining approach that consumes both natural and synthetic data.
arXiv Detail & Related papers (2022-07-08T01:23:45Z) - Table Retrieval May Not Necessitate Table-specific Model Design [83.27735758203089]
We focus on the task of table retrieval, and ask: "is table-specific model design necessary for table retrieval?"
Based on an analysis on a table-based portion of the Natural Questions dataset (NQ-table), we find that structure plays a negligible role in more than 70% of the cases.
We then experiment with three modules to explicitly encode table structures, namely auxiliary row/column embeddings, hard attention masks, and soft relation-based attention biases.
None of these yielded significant improvements, suggesting that table-specific model design may not be necessary for table retrieval.
arXiv Detail & Related papers (2022-05-19T20:35:23Z) - HiTab: A Hierarchical Table Dataset for Question Answering and Natural
Language Generation [35.73434495391091]
Hierarchical tables challenge existing methods by hierarchical indexing, as well as implicit relationships of calculation and semantics.
This work presents HiTab, a free and open dataset for the research community to study question answering (QA) and natural language generation (NLG) over hierarchical tables.
arXiv Detail & Related papers (2021-08-15T10:14:21Z) - Understanding tables with intermediate pre-training [11.96734018295146]
We adapt TAPAS, a table-based BERT model, to recognize entailment.
We evaluate table pruning techniques as a pre-processing step to drastically improve the training and prediction efficiency.
arXiv Detail & Related papers (2020-10-01T17:43:27Z) - GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing.
We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar.
To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z) - TURL: Table Understanding through Representation Learning [29.6016859927782]
TURL is a novel framework that introduces the pre-training/finetuning paradigm to relational Web tables.
During pre-training, our framework learns deep contextualized representations on relational tables in an unsupervised manner.
We show that TURL generalizes well to all tasks and substantially outperforms existing methods in almost all instances.
arXiv Detail & Related papers (2020-06-26T05:44:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.