Related papers: CrackSQL: A Hybrid SQL Dialect Translation System Powered by Large Language Models

CrackSQL: A Hybrid SQL Dialect Translation System Powered by Large Language Models

URL: http://arxiv.org/abs/2504.00882v1
Date: Tue, 01 Apr 2025 15:11:03 GMT
Title: CrackSQL: A Hybrid SQL Dialect Translation System Powered by Large Language Models
Authors: Wei Zhou, Yuyang Gao, Xuanhe Zhou, Guoliang Li,
Abstract summary: Crack is the first hybrid SQL dialect translation system that combines rule and LLM-based methods to overcome limitations.<n>Crack supports three translation modes and offers multiple deployment options including a web console interface, a PyPI package, and a command-line prompt.
Score: 20.718779783349984
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Dialect translation plays a key role in enabling seamless interaction across heterogeneous database systems. However, translating SQL queries between different dialects (e.g., from PostgreSQL to MySQL) remains a challenging task due to syntactic discrepancies and subtle semantic variations. Existing approaches including manual rewriting, rule-based systems, and large language model (LLM)-based techniques often involve high maintenance effort (e.g., crafting custom translation rules) or produce unreliable results (e.g., LLM generates non-existent functions), especially when handling complex queries. In this demonstration, we present CrackSQL, the first hybrid SQL dialect translation system that combines rule and LLM-based methods to overcome these limitations. CrackSQL leverages the adaptability of LLMs to minimize manual intervention, while enhancing translation accuracy by segmenting lengthy complex SQL via functionality-based query processing. To further improve robustness, it incorporates a novel cross-dialect syntax embedding model for precise syntax alignment, as well as an adaptive local-to-global translation strategy that effectively resolves interdependent query operations. CrackSQL supports three translation modes and offers multiple deployment and access options including a web console interface, a PyPI package, and a command-line prompt, facilitating adoption across a variety of real-world use cases

Related papers

RISE: Rule-Driven SQL Dialect Translation via Query Reduction [14.187357850698993]
Large language models (LLMs) can assist in translating SQL dialects, but they often struggle with lengthy and complex queries.<n>We propose RISE, a novel LLM-based SQL dialect translation approach that can accurately handle lengthy and complex queries.<n>We evaluate RISE on two real-world benchmarks, TPC-DS and SQLBench, comparing its performance against both the traditional rule-based tools and the LLM-based approaches.
arXiv Detail & Related papers (2026-01-09T07:00:44Z)
Text-to-SQL as Dual-State Reasoning: Integrating Adaptive Context and Progressive Generation [54.53145282349042]
We introduce DSR-sourced, a textbfDual-textbfS textbfReasoning framework that models Text-to-context as an interaction between an adaptive context state and a progressive generation state.<n>Without any post-training or in-context examples, DSR-sourced achieves competitive performance, reaching 35.28% execution accuracy on Spider 2.0-Snow and 68.32% on BIRD development set.
arXiv Detail & Related papers (2025-11-26T13:52:50Z)
HI-SQL: Optimizing Text-to-SQL Systems through Dynamic Hint Integration [1.3927943269211591]
Text-to-generation bridges the gap between natural language and databases, enabling users to query data without requiringsql expertise.<n>We propose HI-the, a pipeline that incorporates a novel hint generation mechanism utilizing historical query logs.<n>By analyzing prior queries, our method generates contextual hints that focus on handling the complexities of multi-table and nested operations.<n>Our approach significantly improves query accuracy of LLM-generated queries while ensuring efficiency in terms of calls and latency.
arXiv Detail & Related papers (2025-06-11T12:07:55Z)
Semantic Captioning: Benchmark Dataset and Graph-Aware Few-Shot In-Context Learning for SQL2Text [3.4688186440441893]
Large Language Models (LLMs) have demonstrated remarkable performance in various NLP tasks.<n>The reverse process, translating code into natural language, termed semantic captioning, has received less attention.<n>In this paper, we focus on the captioning ofsql query (2Text) to address the critical need for understanding and explaining queries.
arXiv Detail & Related papers (2025-01-06T17:36:09Z)
RSL-SQL: Robust Schema Linking in Text-to-SQL Generation [51.00761167842468]
We propose a novel framework called RSL- that combines bidirectional schema linking, contextual information augmentation, binary selection strategy, and multi-turn self-correction. benchmarks demonstrate that our approach achieves SOTA execution accuracy among open-source solutions, with 67.2% on BIRD and 87.9% on GPT-4ocorrection. Our approach outperforms a series of GPT-4 based Text-to-Seek systems when adopting DeepSeek (much cheaper) with same intact prompts.
arXiv Detail & Related papers (2024-10-31T16:22:26Z)
E-SQL: Direct Schema Linking via Question Enrichment in Text-to-SQL [1.187832944550453]
We introduce E-Seek, a novel pipeline specifically designed to address these challenges through direct schema linking and candidate predicate augmentation.<n>E-Seek enhances the natural language query by incorporating relevant database items (i.e., tables, columns, and values) and conditions directly into the question andsql construction plan, bridging the gap between the query and the database structure.<n> Comprehensive evaluations illustrate that E-Seek achieves competitive performance, particularly excelling in complex queries with a 66.29% execution accuracy on the test set.
arXiv Detail & Related papers (2024-09-25T09:02:48Z)
PTD-SQL: Partitioning and Targeted Drilling with LLMs in Text-to-SQL [54.304872649870575]
Large Language Models (LLMs) have emerged as powerful tools for Text-to-sense tasks. In this study, we propose that employing query group partitioning allows LLMs to focus on learning the thought processes specific to a single problem type.
arXiv Detail & Related papers (2024-09-21T09:33:14Z)
SQLfuse: Enhancing Text-to-SQL Performance through Comprehensive LLM Synergy [24.919119901664843]
This paper introduces a robust system integrating open-source Large Language Models (LLMs) with a suite of tools to enhance query accuracy and usability. demonstrated by its leading performance on the Spider Leaderboard and deployment by Ant Group.
arXiv Detail & Related papers (2024-07-19T06:01:57Z)
RH-SQL: Refined Schema and Hardness Prompt for Text-to-SQL [1.734218686180302]
This paper introduces a method for Text-to- Execute based on Refined Execution Model and Hardness Prompt. It reduces storage and training costs while maintaining performance. Our experiments on the Spider dataset, specifically with large-scale LMs, achieved an exceptional accuracy (EX) of 82.6%.
arXiv Detail & Related papers (2024-06-13T14:04:34Z)
Metasql: A Generate-then-Rank Framework for Natural Language to SQL Translation [10.812409371488913]
We propose a unified generate-then-rank framework that can be flexibly incorporated with existing NLIDBs to consistently improve translation accuracy. Metasql introduces query metadata to control the generation of better query candidates and uses learning-to-rank algorithms to retrieve globally optimized queries. The results show that the performance of the translation models can be effectively improved using Metasql.
arXiv Detail & Related papers (2024-02-27T02:16:07Z)
Structure Guided Large Language Model for SQL Generation [14.079764882536077]
We propose a novel structure-aware text-to- query and framework(SGU)<n>SGU-aware text-to- query and framework(SGU) consistently outperforms state-of-the-art text-to-models.
arXiv Detail & Related papers (2024-02-19T09:07:59Z)
DBCopilot: Natural Language Querying over Massive Databases via Schema Routing [47.009638761948466]
We present DBCopilot, a framework that addresses challenges by employing a compact and flexible copilot model for routing over massive databases.<n>This framework utilizes a single lightweight differentiable search index to construct semantic mappings for massive database schemata, and navigates natural language questions to their target databases and tables in a relation joint retrieval manner.
arXiv Detail & Related papers (2023-12-06T12:37:28Z)
SQLPrompt: In-Context Text-to-SQL with Minimal Labeled Data [54.69489315952524]
"Prompt" is designed to improve the few-shot prompting capabilities of Text-to-LLMs. "Prompt" outperforms previous approaches for in-context learning with few labeled data by a large margin. We show that emphPrompt outperforms previous approaches for in-context learning with few labeled data by a large margin.
arXiv Detail & Related papers (2023-11-06T05:24:06Z)
Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation [76.76046657162306]
Large language models (LLMs) have emerged as a new paradigm for Text-to- task. Large language models (LLMs) have emerged as a new paradigm for Text-to- task.
arXiv Detail & Related papers (2023-08-29T14:59:54Z)
SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs) With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses. With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z)
XRICL: Cross-lingual Retrieval-Augmented In-Context Learning for Cross-lingual Text-to-SQL Semantic Parsing [70.40401197026925]
In-context learning using large language models has recently shown surprising results for semantic parsing tasks. This work introduces the XRICL framework, which learns to retrieve relevant English exemplars for a given query. We also include global translation exemplars for a target language to facilitate the translation process for large language models.
arXiv Detail & Related papers (2022-10-25T01:33:49Z)
xDBTagger: Explainable Natural Language Interface to Databases Using Keyword Mappings and Schema Graph [0.17188280334580192]
Translating natural language queries into structured query language (NLQ) in interfaces to relational databases is a challenging task. We propose xDBTagger, an explainable hybrid translation pipeline that explains the decisions made along the way to the user both textually and visually. xDBTagger is effective in terms of accuracy and translates the queries more efficiently compared to other state-of-the-art pipeline-based systems up to 10000 times.
arXiv Detail & Related papers (2022-10-07T18:17:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.