Related papers: GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing

GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing

URL: http://arxiv.org/abs/2009.13845v2
Date: Sat, 29 May 2021 01:30:29 GMT
Title: GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing
Authors: Tao Yu and Chien-Sheng Wu and Xi Victoria Lin and Bailin Wang and Yi Chern Tan and Xinyi Yang and Dragomir Radev and Richard Socher and Caiming Xiong
Abstract summary: We present GraPPa, an effective pre-training approach for table semantic parsing. We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar. To maintain the model's ability to represent real-world data, we also include masked language modeling.
Score: 117.98107557103877
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present GraPPa, an effective pre-training approach for table semantic parsing that learns a compositional inductive bias in the joint representations of textual and tabular data. We construct synthetic question-SQL pairs over high-quality tables via a synchronous context-free grammar (SCFG) induced from existing text-to-SQL datasets. We pre-train our model on the synthetic data using a novel text-schema linking objective that predicts the syntactic role of a table field in the SQL for each question-SQL pair. To maintain the model's ability to represent real-world data, we also include masked language modeling (MLM) over several existing table-and-language datasets to regularize the pre-training process. On four popular fully supervised and weakly supervised table semantic parsing benchmarks, GraPPa significantly outperforms RoBERTa-large as the feature representation layers and establishes new state-of-the-art results on all of them.

Related papers

TabGLM: Tabular Graph Language Model for Learning Transferable Representations Through Multi-Modal Consistency Minimization [2.1067477213933503]
TabGLM (Tabular Graph Language Model) is a novel multi-modal architecture designed to model both structural and semantic information from a table. It transforms each row of a table into a fully connected graph and serialized text, which are encoded using a graph neural network (GNN) and a text encoder, respectively. Evaluations across 25 benchmark datasets demonstrate substantial performance gains.
arXiv Detail & Related papers (2025-02-26T05:32:45Z)
HeLM: Highlighted Evidence augmented Language Model for Enhanced Table-to-Text Generation [7.69801337810352]
We conduct parameter-efficient fine-tuning on the LLaMA2 model. Our approach involves injecting reasoning information into the input by emphasizing table-specific row data. On both the FetaQA and QTSumm datasets, our approach achieved state-of-the-art results.
arXiv Detail & Related papers (2023-11-15T12:02:52Z)
Bridge the Gap between Language models and Tabular Understanding [99.88470271644894]
Table pretrain-then-finetune paradigm has been proposed and employed at a rapid pace after the success of pre-training in the natural language domain. Despite the promising findings, there is an input gap between pre-training and fine-tuning phases. We propose UTP, an approach that dynamically supports three types of multi-modal inputs: table-text, table, and text.
arXiv Detail & Related papers (2023-02-16T15:16:55Z)
Importance of Synthesizing High-quality Data for Text-to-SQL Parsing [71.02856634369174]
State-of-the-art text-to-weighted algorithms did not further improve on popular benchmarks when trained with augmented synthetic data. We propose a novel framework that incorporates key relationships from schema, imposes strong typing, and schema-weighted column sampling.
arXiv Detail & Related papers (2022-12-17T02:53:21Z)
TABBIE: Pretrained Representations of Tabular Data [22.444607481407633]
We devise a simple pretraining objective that learns exclusively from tabular data. Unlike competing approaches, our model (TABBIE) provides embeddings of all table substructures. A qualitative analysis of our model's learned cell, column, and row representations shows that it understands complex table semantics and numerical trends.
arXiv Detail & Related papers (2021-05-06T11:15:16Z)
Retrieving Complex Tables with Multi-Granular Graph Representation Learning [20.72341939868327]
The task of natural language table retrieval seeks to retrieve semantically relevant tables based on natural language queries. Existing learning systems treat tables as plain text based on the assumption that tables are structured as dataframes. We propose Graph-based Table Retrieval (GTR), a generalizable NLTR framework with multi-granular graph representation learning.
arXiv Detail & Related papers (2021-05-04T20:19:03Z)
Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training [86.91380874390778]
We present Generation-Augmented Pre-training (GAP), that jointly learns representations of natural language utterances and table schemas by leveraging generation models to generate pre-train data. Based on experimental results, neural semantics that leverage GAP MODEL obtain new state-of-the-art results on both SPIDER and CRITERIA-TO-generative benchmarks.
arXiv Detail & Related papers (2020-12-18T15:53:50Z)
TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data [113.29476656550342]
We present TaBERT, a pretrained LM that jointly learns representations for NL sentences and tables. TaBERT is trained on a large corpus of 26 million tables and their English contexts. Implementation of the model will be available at http://fburl.com/TaBERT.
arXiv Detail & Related papers (2020-05-17T17:26:40Z)
ToTTo: A Controlled Table-To-Text Generation Dataset [61.83159452483026]
ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples. We introduce a dataset construction process where annotators directly revise existing candidate sentences from Wikipedia. While usually fluent, existing methods often hallucinate phrases that are not supported by the table.
arXiv Detail & Related papers (2020-04-29T17:53:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.