TAPAS: Weakly Supervised Table Parsing via Pre-training
- URL: http://arxiv.org/abs/2004.02349v2
- Date: Tue, 21 Apr 2020 15:09:48 GMT
- Title: TAPAS: Weakly Supervised Table Parsing via Pre-training
- Authors: Jonathan Herzig, Pawe{\l} Krzysztof Nowak, Thomas M\"uller, Francesco
Piccinno, Julian Martin Eisenschlos
- Abstract summary: We present TAPAS, an approach to question answering over tables without generating logical forms.
We experiment with three different semantic parsing datasets.
We find that TAPAS outperforms or rivals semantic parsing models by improving state-of-the-art accuracy.
- Score: 16.661382998729067
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Answering natural language questions over tables is usually seen as a
semantic parsing task. To alleviate the collection cost of full logical forms,
one popular approach focuses on weak supervision consisting of denotations
instead of logical forms. However, training semantic parsers from weak
supervision poses difficulties, and in addition, the generated logical forms
are only used as an intermediate step prior to retrieving the denotation. In
this paper, we present TAPAS, an approach to question answering over tables
without generating logical forms. TAPAS trains from weak supervision, and
predicts the denotation by selecting table cells and optionally applying a
corresponding aggregation operator to such selection. TAPAS extends BERT's
architecture to encode tables as input, initializes from an effective joint
pre-training of text segments and tables crawled from Wikipedia, and is trained
end-to-end. We experiment with three different semantic parsing datasets, and
find that TAPAS outperforms or rivals semantic parsing models by improving
state-of-the-art accuracy on SQA from 55.1 to 67.2 and performing on par with
the state-of-the-art on WIKISQL and WIKITQ, but with a simpler model
architecture. We additionally find that transfer learning, which is trivial in
our setting, from WIKISQL to WIKITQ, yields 48.7 accuracy, 4.2 points above the
state-of-the-art.
Related papers
- Chain-of-Table: Evolving Tables in the Reasoning Chain for Table
Understanding [79.9461269253121]
We propose the Chain-of-Table framework, where tabular data is explicitly used in the reasoning chain as a proxy for intermediate thoughts.
Chain-of-Table achieves new state-of-the-art performance on WikiTQ, FeTaQA, and TabFact benchmarks.
arXiv Detail & Related papers (2024-01-09T07:46:26Z) - Bridge the Gap between Language models and Tabular Understanding [99.88470271644894]
Table pretrain-then-finetune paradigm has been proposed and employed at a rapid pace after the success of pre-training in the natural language domain.
Despite the promising findings, there is an input gap between pre-training and fine-tuning phases.
We propose UTP, an approach that dynamically supports three types of multi-modal inputs: table-text, table, and text.
arXiv Detail & Related papers (2023-02-16T15:16:55Z) - ReasTAP: Injecting Table Reasoning Skills During Pre-training via
Synthetic Reasoning Examples [15.212332890570869]
We develop ReasTAP to show that high-level table reasoning skills can be injected into models during pre-training without a complex table-specific architecture design.
ReasTAP achieves new state-of-the-art performance on all benchmarks and delivers a significant improvement on low-resource setting.
arXiv Detail & Related papers (2022-10-22T07:04:02Z) - OmniTab: Pretraining with Natural and Synthetic Data for Few-shot
Table-based Question Answering [106.73213656603453]
We develop a simple table-based QA model with minimal annotation effort.
We propose an omnivorous pretraining approach that consumes both natural and synthetic data.
arXiv Detail & Related papers (2022-07-08T01:23:45Z) - Table Retrieval May Not Necessitate Table-specific Model Design [83.27735758203089]
We focus on the task of table retrieval, and ask: "is table-specific model design necessary for table retrieval?"
Based on an analysis on a table-based portion of the Natural Questions dataset (NQ-table), we find that structure plays a negligible role in more than 70% of the cases.
We then experiment with three modules to explicitly encode table structures, namely auxiliary row/column embeddings, hard attention masks, and soft relation-based attention biases.
None of these yielded significant improvements, suggesting that table-specific model design may not be necessary for table retrieval.
arXiv Detail & Related papers (2022-05-19T20:35:23Z) - Topic Transferable Table Question Answering [33.54533181098762]
Weakly-supervised table question-answering(TableQA) models have achieved state-of-art performance by using pre-trained BERT transformer to jointly encoding a question and a table to produce structured query for the question.
In practical settings TableQA systems are deployed over table corpora having topic and word distributions quite distinct from BERT's pretraining corpus.
We propose T3QA (Topic Transferable Table Question Answering) as a pragmatic adaptation framework for TableQA.
arXiv Detail & Related papers (2021-09-15T15:34:39Z) - Understanding tables with intermediate pre-training [11.96734018295146]
We adapt TAPAS, a table-based BERT model, to recognize entailment.
We evaluate table pruning techniques as a pre-processing step to drastically improve the training and prediction efficiency.
arXiv Detail & Related papers (2020-10-01T17:43:27Z) - GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing.
We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar.
To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z) - TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data [113.29476656550342]
We present TaBERT, a pretrained LM that jointly learns representations for NL sentences and tables.
TaBERT is trained on a large corpus of 26 million tables and their English contexts.
Implementation of the model will be available at http://fburl.com/TaBERT.
arXiv Detail & Related papers (2020-05-17T17:26:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.