Related papers: Weakly Supervised Mapping of Natural Language to SQL through Question Decomposition

Weakly Supervised Mapping of Natural Language to SQL through Question Decomposition

URL: http://arxiv.org/abs/2112.06311v1
Date: Sun, 12 Dec 2021 20:02:42 GMT
Title: Weakly Supervised Mapping of Natural Language to SQL through Question Decomposition
Authors: Tomer Wolfson, Jonathan Berant and Daniel Deutch
Abstract summary: We propose an alternative approach for training machine learning-based NLIDBs, using weak supervision. We use the recently proposed question decomposition representation called QDMR, an intermediate between NL and formal query languages. Our solution, requiring zero expert annotations, performs competitively with models trained on expert annotated data.
Score: 39.32886310973576
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Natural Language Interfaces to Databases (NLIDBs), where users pose queries in Natural Language (NL), are crucial for enabling non-experts to gain insights from data. Developing such interfaces, by contrast, is dependent on experts who often code heuristics for mapping NL to SQL. Alternatively, NLIDBs based on machine learning models rely on supervised examples of NL to SQL mappings (NL-SQL pairs) used as training data. Such examples are again procured using experts, which typically involves more than a one-off interaction. Namely, each data domain in which the NLIDB is deployed may have different characteristics and therefore require either dedicated heuristics or domain-specific training examples. To this end, we propose an alternative approach for training machine learning-based NLIDBs, using weak supervision. We use the recently proposed question decomposition representation called QDMR, an intermediate between NL and formal query languages. Recent work has shown that non-experts are generally successful in translating NL to QDMR. We consequently use NL-QDMR pairs, along with the question answers, as supervision for automatically synthesizing SQL queries. The NL questions and synthesized SQL are then used to train NL-to-SQL models, which we test on five benchmark datasets. Extensive experiments show that our solution, requiring zero expert annotations, performs competitively with models trained on expert annotated data.

Related papers

Fine-Tuning Language Models for Context-Specific SQL Query Generation [0.0]
This paper presents a novel approach to fine-tuning open-source large language models (LLMs) for the task of transforming natural language intosql queries. We introduce models specialized in generatingsql queries, trained on synthetic datasets tailored to the Snowflake SQL and Google dialects. Our methodology involves generating a context-specific dataset using GPT-4, then fine-tuning three open-source LLMs(Starcoder Plus, Code-Llama, and Mistral) employing the LoRa technique to optimize for resource constraints. The fine-tuned models demonstrate superior performance in zero-shot settings compared to the baseline GP
arXiv Detail & Related papers (2023-12-04T18:04:27Z)
Natural language to SQL in low-code platforms [0.0]
We propose a pipeline allowing developers to write natural language (NL) queries. We collect, label, and validate data covering the queries most often performed by OutSystems users. We describe the entire pipeline, which comprises a feedback loop that allows us to quickly collect production data.
arXiv Detail & Related papers (2023-08-29T11:59:02Z)
ScienceBenchmark: A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems [16.33799752421288]
We introduce ScienceBenchmark, a new complex NL-to- benchmark for three real-world, highly domain-specific databases. We show that our benchmark is highly challenging, as the top performing systems on Spider achieve a very low performance on our benchmark.
arXiv Detail & Related papers (2023-06-07T19:37:55Z)
STAR: SQL Guided Pre-Training for Context-dependent Text-to-SQL Parsing [64.80483736666123]
We propose a novel pre-training framework STAR for context-dependent text-to- parsing. In addition, we construct a large-scale context-dependent text-to-the-art conversation corpus to pre-train STAR. Extensive experiments show that STAR achieves new state-of-the-art performance on two downstream benchmarks.
arXiv Detail & Related papers (2022-10-21T11:30:07Z)
A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions [102.8606542189429]
The goal of text-to-corpora parsing is to convert a natural language (NL) question to its corresponding structured query language () based on the evidences provided by databases. Deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output query.
arXiv Detail & Related papers (2022-08-29T14:24:13Z)
"What Do You Mean by That?" A Parser-Independent Interactive Approach for Enhancing Text-to-SQL [49.85635994436742]
We include human in the loop and present a novel-independent interactive approach (PIIA) that interacts with users using multi-choice questions. PIIA is capable of enhancing the text-to-domain performance with limited interaction turns by using both simulation and human evaluation.
arXiv Detail & Related papers (2020-11-09T02:14:33Z)
Data Agnostic RoBERTa-based Natural Language to SQL Query Generation [0.0]
The NL2 task aims at finding deep learning approaches to solve the problem converting by natural language questions into valid queries. We have presented an approach with data privacy at its core. Although we have not achieved state of the art results, we have eliminated the need for the table right from the training of the model.
arXiv Detail & Related papers (2020-10-11T13:18:46Z)
Photon: A Robust Cross-Domain Text-to-SQL System [189.1405317853752]
We present Photon, a robust, modular, cross-domain NLIDB that can flag natural language input to which a mapping cannot be immediately determined. The proposed method effectively improves the robustness of text-to-native system against untranslatable user input.
arXiv Detail & Related papers (2020-07-30T07:44:48Z)
ValueNet: A Natural Language-to-SQL System that Learns from Database Information [4.788755317132195]
Building natural language interfaces for databases has been a long-standing challenge. Recent focus of research has been on neural networks to tackle this challenge on complex datasets like Spider. We propose two end-to-end NL-to-end systems that incorporate values using the challenging Spider.
arXiv Detail & Related papers (2020-05-29T15:43:39Z)
TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data [113.29476656550342]
We present TaBERT, a pretrained LM that jointly learns representations for NL sentences and tables. TaBERT is trained on a large corpus of 26 million tables and their English contexts. Implementation of the model will be available at http://fburl.com/TaBERT.
arXiv Detail & Related papers (2020-05-17T17:26:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.