RH-SQL: Refined Schema and Hardness Prompt for Text-to-SQL
- URL: http://arxiv.org/abs/2406.09133v1
- Date: Thu, 13 Jun 2024 14:04:34 GMT
- Title: RH-SQL: Refined Schema and Hardness Prompt for Text-to-SQL
- Authors: Jiawen Yi, Guo Chen, Zixiang Shen,
- Abstract summary: This paper introduces a method for Text-to- Execute based on Refined Execution Model and Hardness Prompt.
It reduces storage and training costs while maintaining performance.
Our experiments on the Spider dataset, specifically with large-scale LMs, achieved an exceptional accuracy (EX) of 82.6%.
- Score: 1.734218686180302
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-SQL is a technology that converts natural language queries into the structured query language SQL. A novel research approach that has recently gained attention focuses on methods based on the complexity of SQL queries, achieving notable performance improvements. However, existing methods entail significant storage and training costs, which hampers their practical application. To address this issue, this paper introduces a method for Text-to-SQL based on Refined Schema and Hardness Prompt. By filtering out low-relevance schema information with a refined schema and identifying query hardness through a Language Model (LM) to form prompts, this method reduces storage and training costs while maintaining performance. It's worth mentioning that this method is applicable to any sequence-to-sequence (seq2seq) LM. Our experiments on the Spider dataset, specifically with large-scale LMs, achieved an exceptional Execution accuracy (EX) of 82.6%, demonstrating the effectiveness and greater suitability of our method for real-world applications.
Related papers
- Learning from Imperfect Data: Towards Efficient Knowledge Distillation of Autoregressive Language Models for Text-to-SQL [83.99974309930072]
Knowledge distillation (KD) is a common approach, which aims to distill the larger teacher model into a smaller student model.
We propose to improve the KD with Imperfect Data, namely KID, which effectively boosts the performance without introducing much training budget.
KID can not only achieve consistent and significant performance gains across all model types and sizes, but also effectively improve the training efficiency.
arXiv Detail & Related papers (2024-10-15T07:51:00Z) - E-SQL: Direct Schema Linking via Question Enrichment in Text-to-SQL [1.187832944550453]
We introduce E- repository, a novel pipeline designed to address challenges through direct schema linking and candidate predicate augmentation.
E- enhances the natural language query by incorporating relevant database items (i.e. tables, columns, and values) and conditions directly into the question, bridging the gap between the query and the database structure.
We investigate the impact of schema filtering, a technique widely explored in previous work, and demonstrate its diminishing returns when applied alongside advanced large language models.
arXiv Detail & Related papers (2024-09-25T09:02:48Z) - RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL [48.516004807486745]
Large language models (LLMs) with in-context learning have significantly improved the performance of text-to- task.
We propose RB-, a novel retrieval-based framework for in-context prompt engineering.
Experiment results demonstrate that our model achieves better performance than several competitive baselines on public datasets BIRD and Spider.
arXiv Detail & Related papers (2024-07-11T08:19:58Z) - CHESS: Contextual Harnessing for Efficient SQL Synthesis [1.9506402593665235]
We propose a new pipeline that retrieves relevant data and context, selects an efficient schema, and synthesizes correct and efficient queries.
Our method achieves new state-of-the-art performance on the cross-domain challenging BIRD dataset.
arXiv Detail & Related papers (2024-05-27T01:54:16Z) - EPI-SQL: Enhancing Text-to-SQL Translation with Error-Prevention Instructions [0.5755004576310334]
This paper introduces EPI-, a novel methodological framework leveraging Large Language Models (LLMs) to enhance the performance of Text-to-one tasks.
EPI- operates through a four-step process to generate general error-prevention instructions (EPIs)
It provides task-specific guidance, enabling the model to circumvent potential errors for the task at hand.
arXiv Detail & Related papers (2024-04-21T03:52:46Z) - SQLPrompt: In-Context Text-to-SQL with Minimal Labeled Data [54.69489315952524]
"Prompt" is designed to improve the few-shot prompting capabilities of Text-to-LLMs.
"Prompt" outperforms previous approaches for in-context learning with few labeled data by a large margin.
We show that emphPrompt outperforms previous approaches for in-context learning with few labeled data by a large margin.
arXiv Detail & Related papers (2023-11-06T05:24:06Z) - Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation [76.76046657162306]
Large language models (LLMs) have emerged as a new paradigm for Text-to- task.
Large language models (LLMs) have emerged as a new paradigm for Text-to- task.
arXiv Detail & Related papers (2023-08-29T14:59:54Z) - SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs)
With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses.
With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z) - S$^2$SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder
for Text-to-SQL Parsers [66.78665327694625]
We propose S$2$, injecting Syntax to question- encoder graph for Text-to- relational parsing.
We also employ the decoupling constraint to induce diverse edge embedding, which further improves the network's performance.
Experiments on the Spider and robustness setting Spider-Syn demonstrate that the proposed approach outperforms all existing methods when pre-training models are used.
arXiv Detail & Related papers (2022-03-14T09:49:15Z) - Improving Text-to-SQL with Schema Dependency Learning [22.07452161565993]
Execution-guided decoding relies on database execution, which slows down the inference process and is unsatisfactory for many real-world applications.
We present the Dependency guided multi-task Text-to-task model (SD) to guide the network to effectively capture the interactions between questions and schemas.
arXiv Detail & Related papers (2021-03-07T16:56:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.