Related papers: SPSQL: Step-by-step Parsing Based Framework for Text-to-SQL Generation

SPSQL: Step-by-step Parsing Based Framework for Text-to-SQL Generation

URL: http://arxiv.org/abs/2305.11061v1
Date: Wed, 10 May 2023 10:01:36 GMT
Title: SPSQL: Step-by-step Parsing Based Framework for Text-to-SQL Generation
Authors: Ran Shen, Gang Sun, Hao Shen, Yiling Li, Liangfeng Jin and Han Jiang
Abstract summary: The current mainstream end-to-end Text2 model is not only difficult to build due to its complex structure and high requirements for training data, but also difficult to adjust due to massive parameters. This paper proposes a pipeline method: SP Experiments to achieve the desired result. We construct the dataset based on the marketing business data of the State Grid Corporation of China.
Score: 13.196264569882777
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Converting text into the structured query language (Text2SQL) is a research hotspot in the field of natural language processing (NLP), which has broad application prospects. In the era of big data, the use of databases has penetrated all walks of life, in which the collected data is large in scale, diverse in variety, and wide in scope, making the data query cumbersome and inefficient, and putting forward higher requirements for the Text2SQL model. In practical applications, the current mainstream end-to-end Text2SQL model is not only difficult to build due to its complex structure and high requirements for training data, but also difficult to adjust due to massive parameters. In addition, the accuracy of the model is hard to achieve the desired result. Based on this, this paper proposes a pipelined Text2SQL method: SPSQL. This method disassembles the Text2SQL task into four subtasks--table selection, column selection, SQL generation, and value filling, which can be converted into a text classification problem, a sequence labeling problem, and two text generation problems, respectively. Then, we construct data formats of different subtasks based on existing data and improve the accuracy of the overall model by improving the accuracy of each submodel. We also use the named entity recognition module and data augmentation to optimize the overall model. We construct the dataset based on the marketing business data of the State Grid Corporation of China. Experiments demonstrate our proposed method achieves the best performance compared with the end-to-end method and other pipeline methods.

Related papers

RetrySQL: text-to-SQL training with retry data for self-correcting query generation [1.6707278580444538]
We introduce Retry, a new approach to training text-to-generation models.<n>We demonstrate that retry steps yield an improvement of up to 4 percentage points in both overall and challenging execution accuracy metrics.
arXiv Detail & Related papers (2025-07-03T11:00:49Z)
Bridging the Gap: Enabling Natural Language Queries for NoSQL Databases through Text-to-NoSQL Translation [25.638927795540454]
We introduce the Text-to-No task, which aims to convert natural language queries into accessible queries. To promote research in this area, we released a large-scale and open-source dataset for this task, named TEND (short interfaces for Text-to-No dataset) We also designed a SLM (Small Language Model)-assisted and RAG (Retrieval-augmented Generation)-assisted multi-step framework called SMART, which is specifically designed for Text-to-No conversion.
arXiv Detail & Related papers (2025-02-16T17:01:48Z)
CodeS: Towards Building Open-source Language Models for Text-to-SQL [42.11113113574589]
We introduce CodeS, a series of pre-trained language models with parameters ranging from 1B to 15B. CodeS is a fully open language model, which achieves superior accuracy with much smaller parameter sizes. We conduct comprehensive evaluations on multiple datasets, including the widely used Spider benchmark.
arXiv Detail & Related papers (2024-02-26T07:00:58Z)
Evaluating the Data Model Robustness of Text-to-SQL Systems Based on Real User Queries [4.141402725050671]
This paper is the first in-depth evaluation of the data model robustness of Text-to-- systems in practice. It is based on a real-world deployment of FootballDB, a system that was deployed over a 9 month period in the context of the FIFA World Cup 2022. All of our data is based on real user questions that were asked live to the system. We manually labeled and translated a subset of these questions for three different data models.
arXiv Detail & Related papers (2024-02-13T10:28:57Z)
SQLPrompt: In-Context Text-to-SQL with Minimal Labeled Data [54.69489315952524]
"Prompt" is designed to improve the few-shot prompting capabilities of Text-to-LLMs. "Prompt" outperforms previous approaches for in-context learning with few labeled data by a large margin. We show that emphPrompt outperforms previous approaches for in-context learning with few labeled data by a large margin.
arXiv Detail & Related papers (2023-11-06T05:24:06Z)
Reboost Large Language Model-based Text-to-SQL, Text-to-Python, and Text-to-Function -- with Real Applications in Traffic Domain [14.194710636073808]
Previous state-of-the-art (SOTA) method achieved remarkable execution accuracy on the Spider dataset. We develop a more adaptable and more general prompting method, involving query rewriting andsql boosting. In terms of execution accuracy on the business dataset, the SOTA method scored 21.05, while our approach scored 65.79.
arXiv Detail & Related papers (2023-10-28T16:32:40Z)
SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs) With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses. With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z)
UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems. It is composed of publicly available text-to-domain datasets and 29K databases. Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z)
Wav2SQL: Direct Generalizable Speech-To-SQL Parsing [55.10009651476589]
Speech-to-Spider (S2Spider) aims to convert spoken questions intosql queries given databases. We propose the first direct speech-to-speaker parsing model Wav2 which avoids error compounding across cascaded systems. Experimental results demonstrate that Wav2 avoids error compounding and achieves state-of-the-art results by up to 2.5% accuracy improvement over the baseline.
arXiv Detail & Related papers (2023-05-21T19:26:46Z)
A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions [102.8606542189429]
The goal of text-to-corpora parsing is to convert a natural language (NL) question to its corresponding structured query language () based on the evidences provided by databases. Deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output query.
arXiv Detail & Related papers (2022-08-29T14:24:13Z)
S$^2$SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers [66.78665327694625]
We propose S$2$, injecting Syntax to question- encoder graph for Text-to- relational parsing. We also employ the decoupling constraint to induce diverse edge embedding, which further improves the network's performance. Experiments on the Spider and robustness setting Spider-Syn demonstrate that the proposed approach outperforms all existing methods when pre-training models are used.
arXiv Detail & Related papers (2022-03-14T09:49:15Z)
Data Augmentation with Hierarchical SQL-to-Question Generation for Cross-domain Text-to-SQL Parsing [40.65143087243074]
This paper presents a simple yet effective data augmentation framework. First, given a database, we automatically produce a large amount ofsql queries based on an abstract syntax tree grammar citeyintranx. Second, we propose a hierarchicalsql-to-question generation model to obtain high-quality natural language questions.
arXiv Detail & Related papers (2021-03-03T07:37:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.