Self-supervised Text-to-SQL Learning with Header Alignment Training
- URL: http://arxiv.org/abs/2103.06402v1
- Date: Thu, 11 Mar 2021 01:09:59 GMT
- Title: Self-supervised Text-to-SQL Learning with Header Alignment Training
- Authors: Donggyu Kim, Seanie Lee
- Abstract summary: Self-supervised learning is a de-facto component for the recent success of deep learning in various fields.
We propose a novel self-supervised learning framework to tackle discrepancy between a self-supervised learning objective and a task-specific objective.
Our method is effective for training the model with scarce labeled data.
- Score: 4.518012967046983
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Since we can leverage a large amount of unlabeled data without any human
supervision to train a model and transfer the knowledge to target tasks,
self-supervised learning is a de-facto component for the recent success of deep
learning in various fields. However, in many cases, there is a discrepancy
between a self-supervised learning objective and a task-specific objective. In
order to tackle such discrepancy in Text-to-SQL task, we propose a novel
self-supervised learning framework. We utilize the task-specific properties of
Text-to-SQL task and the underlying structures of table contents to train the
models to learn useful knowledge of the \textit{header-column} alignment task
from unlabeled table data. We are able to transfer the knowledge to the
supervised Text-to-SQL training with annotated samples, so that the model can
leverage the knowledge to better perform the \textit{header-span} alignment
task to predict SQL statements. Experimental results show that our
self-supervised learning framework significantly improves the performance of
the existing strong BERT based models without using large external corpora. In
particular, our method is effective for training the model with scarce labeled
data. The source code of this work is available in GitHub.
Related papers
- Tabular Transfer Learning via Prompting LLMs [52.96022335067357]
We propose a novel framework, Prompt to Transfer (P2T), that utilizes unlabeled (or heterogeneous) source data with large language models (LLMs)
P2T identifies a column feature in a source dataset that is strongly correlated with a target task feature to create examples relevant to the target task, thus creating pseudo-demonstrations for prompts.
arXiv Detail & Related papers (2024-08-09T11:30:52Z) - SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs)
With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses.
With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z) - Augmenting Multi-Turn Text-to-SQL Datasets with Self-Play [46.07002748587857]
We explore augmenting the training datasets using self-play, which leverages contextual information to synthesize new interactions.
We find that self-play improves the accuracy of a strong baseline on SParC and Co, two widely used text-to-domain datasets.
arXiv Detail & Related papers (2022-10-21T16:40:07Z) - Curriculum-Based Self-Training Makes Better Few-Shot Learners for
Data-to-Text Generation [56.98033565736974]
We propose Curriculum-Based Self-Training (CBST) to leverage unlabeled data in a rearranged order determined by the difficulty of text generation.
Our method can outperform fine-tuning and task-adaptive pre-training methods, and achieve state-of-the-art performance in the few-shot setting of data-to-text generation.
arXiv Detail & Related papers (2022-06-06T16:11:58Z) - Leveraging Table Content for Zero-shot Text-to-SQL with Meta-Learning [25.69875174742935]
Single-table text-to-one aims to transform a natural language question into a query according to one single table.
We propose a new approach for the zero-shot text-to-one task which does not rely on any additional manual annotations.
We conduct extensive experiments on a public open-domain text-to-one dataset and a domain-specific dataset E.
arXiv Detail & Related papers (2021-09-12T01:01:28Z) - Structure-Grounded Pretraining for Text-to-SQL [75.19554243393814]
We present a novel weakly supervised StructureStrued pretraining framework (G) for text-to-LARGE.
We identify a set of novel prediction tasks: column grounding, value grounding and column-value mapping, and leverage them to pretrain a text-table encoder.
arXiv Detail & Related papers (2020-10-24T04:35:35Z) - Learning Better Representation for Tables by Self-Supervised Tasks [23.69766883380125]
We propose two self-supervised tasks, Number Ordering and Significance Ordering, to help to learn better table representation.
We test our methods on the widely used dataset ROTOWIRE which consists of NBA game statistic and related news.
arXiv Detail & Related papers (2020-10-15T09:03:38Z) - TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data [113.29476656550342]
We present TaBERT, a pretrained LM that jointly learns representations for NL sentences and tables.
TaBERT is trained on a large corpus of 26 million tables and their English contexts.
Implementation of the model will be available at http://fburl.com/TaBERT.
arXiv Detail & Related papers (2020-05-17T17:26:40Z) - Exploiting Structured Knowledge in Text via Graph-Guided Representation
Learning [73.0598186896953]
We present two self-supervised tasks learning over raw text with the guidance from knowledge graphs.
Building upon entity-level masked language models, our first contribution is an entity masking scheme.
In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training.
arXiv Detail & Related papers (2020-04-29T14:22:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.