SPSQL: Step-by-step Parsing Based Framework for Text-to-SQL Generation
- URL: http://arxiv.org/abs/2305.11061v1
- Date: Wed, 10 May 2023 10:01:36 GMT
- Title: SPSQL: Step-by-step Parsing Based Framework for Text-to-SQL Generation
- Authors: Ran Shen, Gang Sun, Hao Shen, Yiling Li, Liangfeng Jin and Han Jiang
- Abstract summary: The current mainstream end-to-end Text2 model is not only difficult to build due to its complex structure and high requirements for training data, but also difficult to adjust due to massive parameters.
This paper proposes a pipeline method: SP Experiments to achieve the desired result.
We construct the dataset based on the marketing business data of the State Grid Corporation of China.
- Score: 13.196264569882777
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Converting text into the structured query language (Text2SQL) is a research
hotspot in the field of natural language processing (NLP), which has broad
application prospects. In the era of big data, the use of databases has
penetrated all walks of life, in which the collected data is large in scale,
diverse in variety, and wide in scope, making the data query cumbersome and
inefficient, and putting forward higher requirements for the Text2SQL model. In
practical applications, the current mainstream end-to-end Text2SQL model is not
only difficult to build due to its complex structure and high requirements for
training data, but also difficult to adjust due to massive parameters. In
addition, the accuracy of the model is hard to achieve the desired result.
Based on this, this paper proposes a pipelined Text2SQL method: SPSQL. This
method disassembles the Text2SQL task into four subtasks--table selection,
column selection, SQL generation, and value filling, which can be converted
into a text classification problem, a sequence labeling problem, and two text
generation problems, respectively. Then, we construct data formats of different
subtasks based on existing data and improve the accuracy of the overall model
by improving the accuracy of each submodel. We also use the named entity
recognition module and data augmentation to optimize the overall model. We
construct the dataset based on the marketing business data of the State Grid
Corporation of China. Experiments demonstrate our proposed method achieves the
best performance compared with the end-to-end method and other pipeline
methods.
Related papers
- CodeS: Towards Building Open-source Language Models for Text-to-SQL [42.11113113574589]
We introduce CodeS, a series of pre-trained language models with parameters ranging from 1B to 15B.
CodeS is a fully open language model, which achieves superior accuracy with much smaller parameter sizes.
We conduct comprehensive evaluations on multiple datasets, including the widely used Spider benchmark.
arXiv Detail & Related papers (2024-02-26T07:00:58Z) - Evaluating the Data Model Robustness of Text-to-SQL Systems Based on Real User Queries [4.141402725050671]
This paper is the first in-depth evaluation of the data model robustness of Text-to-- systems in practice.
It is based on a real-world deployment of FootballDB, a system that was deployed over a 9 month period in the context of the FIFA World Cup 2022.
All of our data is based on real user questions that were asked live to the system. We manually labeled and translated a subset of these questions for three different data models.
arXiv Detail & Related papers (2024-02-13T10:28:57Z) - SQLPrompt: In-Context Text-to-SQL with Minimal Labeled Data [54.69489315952524]
"Prompt" is designed to improve the few-shot prompting capabilities of Text-to-LLMs.
"Prompt" outperforms previous approaches for in-context learning with few labeled data by a large margin.
We show that emphPrompt outperforms previous approaches for in-context learning with few labeled data by a large margin.
arXiv Detail & Related papers (2023-11-06T05:24:06Z) - Reboost Large Language Model-based Text-to-SQL, Text-to-Python, and
Text-to-Function -- with Real Applications in Traffic Domain [14.194710636073808]
Previous state-of-the-art (SOTA) method achieved remarkable execution accuracy on the Spider dataset.
We develop a more adaptable and more general prompting method, involving query rewriting andsql boosting.
In terms of execution accuracy on the business dataset, the SOTA method scored 21.05, while our approach scored 65.79.
arXiv Detail & Related papers (2023-10-28T16:32:40Z) - SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs)
With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses.
With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z) - UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems.
It is composed of publicly available text-to-domain datasets and 29K databases.
Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z) - Wav2SQL: Direct Generalizable Speech-To-SQL Parsing [55.10009651476589]
Speech-to-Spider (S2Spider) aims to convert spoken questions intosql queries given databases.
We propose the first direct speech-to-speaker parsing model Wav2 which avoids error compounding across cascaded systems.
Experimental results demonstrate that Wav2 avoids error compounding and achieves state-of-the-art results by up to 2.5% accuracy improvement over the baseline.
arXiv Detail & Related papers (2023-05-21T19:26:46Z) - A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future
Directions [102.8606542189429]
The goal of text-to-corpora parsing is to convert a natural language (NL) question to its corresponding structured query language () based on the evidences provided by databases.
Deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output query.
arXiv Detail & Related papers (2022-08-29T14:24:13Z) - S$^2$SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder
for Text-to-SQL Parsers [66.78665327694625]
We propose S$2$, injecting Syntax to question- encoder graph for Text-to- relational parsing.
We also employ the decoupling constraint to induce diverse edge embedding, which further improves the network's performance.
Experiments on the Spider and robustness setting Spider-Syn demonstrate that the proposed approach outperforms all existing methods when pre-training models are used.
arXiv Detail & Related papers (2022-03-14T09:49:15Z) - Data Augmentation with Hierarchical SQL-to-Question Generation for
Cross-domain Text-to-SQL Parsing [40.65143087243074]
This paper presents a simple yet effective data augmentation framework.
First, given a database, we automatically produce a large amount ofsql queries based on an abstract syntax tree grammar citeyintranx.
Second, we propose a hierarchicalsql-to-question generation model to obtain high-quality natural language questions.
arXiv Detail & Related papers (2021-03-03T07:37:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.