RISE: Rule-Driven SQL Dialect Translation via Query Reduction
- URL: http://arxiv.org/abs/2601.05579v1
- Date: Fri, 09 Jan 2026 07:00:44 GMT
- Title: RISE: Rule-Driven SQL Dialect Translation via Query Reduction
- Authors: Xudong Xie, Yuwei Zhang, Wensheng Dou, Yu Gao, Ziyu Cui, Jiansen Song, Rui Yang, Jun Wei,
- Abstract summary: Large language models (LLMs) can assist in translating SQL dialects, but they often struggle with lengthy and complex queries.<n>We propose RISE, a novel LLM-based SQL dialect translation approach that can accurately handle lengthy and complex queries.<n>We evaluate RISE on two real-world benchmarks, TPC-DS and SQLBench, comparing its performance against both the traditional rule-based tools and the LLM-based approaches.
- Score: 14.187357850698993
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Translating SQL dialects across different relational database management systems (RDBMSs) is crucial for migrating RDBMS-based applications to the cloud. Traditional SQL dialect translation tools rely on manually-crafted rules, necessitating significant manual effort to support new RDBMSs and dialects. Although large language models (LLMs) can assist in translating SQL dialects, they often struggle with lengthy and complex SQL queries. In this paper, we propose RISE, a novel LLM-based SQL dialect translation approach that can accurately handle lengthy and complex SQL queries. Given a complex source query $Q_c$ that contains a SQL dialect $d$, we first employ a dialect-aware query reduction technique to derive a simplified query $Q_{s}$ by removing $d$-irrelevant SQL elements from $Q_c$. Subsequently, we utilize LLMs to translate $Q_{s}$ into $Q_{s^{'}}$, and automatically extract the translation rule $r_d$ for dialect $d$ based on the relationship between $Q_{s}$ and $Q_{s^{'}}$. By applying $r_d$ to $Q_c$, we can effectively translate the dialect $d$ within $Q_c$, thereby bypassing the complexity of the source query $Q_c$. We evaluate RISE on two real-world benchmarks, i.e., TPC-DS and SQLProcBench, comparing its performance against both the traditional rule-based tools and the LLM-based approaches with respect to translation accuracy. RISE achieves accuracies of 97.98% on TPC-DS and 100% on SQLProcBench, outperforming the baselines by an average improvement of 24.62% and 238.41%, respectively.
Related papers
- PARROT: A Benchmark for Evaluating LLMs in Cross-System SQL Translation [21.0303026118673]
We introduce PARROT, a practical and realistic benchmak for CrOss-System SQL Translation.<n> PARROT comprises 598 translation pairs from 38 open-source benchmarks and real-world business services.<n>We also provide multiple benchmark variants, including PARROT-Diverse with 28,003 translations and PARROT-Simple with 5,306 representative samples.
arXiv Detail & Related papers (2025-09-27T14:41:13Z) - End-to-End Text-to-SQL with Dataset Selection: Leveraging LLMs for Adaptive Query Generation [6.5390580456423555]
Traditional approaches model text-to- query as a direct translation task.<n>Recent advances in large language models (LLMs) have significantly improved translation accuracy.<n>We propose a three-stage end-to-end text-to-end framework to identify the user's intended database.
arXiv Detail & Related papers (2025-08-08T15:16:36Z) - CrackSQL: A Hybrid SQL Dialect Translation System Powered by Large Language Models [20.718779783349984]
Crack is the first hybrid SQL dialect translation system that combines rule and LLM-based methods to overcome limitations.<n>Crack supports three translation modes and offers multiple deployment options including a web console interface, a PyPI package, and a command-line prompt.
arXiv Detail & Related papers (2025-04-01T15:11:03Z) - Can the Rookies Cut the Tough Cookie? Exploring the Use of LLMs for SQL Equivalence Checking [15.42143912008553]
We introduce a novel, realistic, and sufficiently complex benchmark called SQLEquiQuest for query equivalence checking.<n>We evaluate several state-of-the-art LLMs using various prompting strategies and carefully constructed in-context learning examples.<n>Our analysis shows that LLMs exhibit a strong bias for equivalence predictions, with consistently poor performance over non-equivalent pairs.
arXiv Detail & Related papers (2024-12-07T06:50:12Z) - RSL-SQL: Robust Schema Linking in Text-to-SQL Generation [51.00761167842468]
We propose a novel framework called RSL- that combines bidirectional schema linking, contextual information augmentation, binary selection strategy, and multi-turn self-correction.
benchmarks demonstrate that our approach achieves SOTA execution accuracy among open-source solutions, with 67.2% on BIRD and 87.9% on GPT-4ocorrection.
Our approach outperforms a series of GPT-4 based Text-to-Seek systems when adopting DeepSeek (much cheaper) with same intact prompts.
arXiv Detail & Related papers (2024-10-31T16:22:26Z) - PTD-SQL: Partitioning and Targeted Drilling with LLMs in Text-to-SQL [54.304872649870575]
Large Language Models (LLMs) have emerged as powerful tools for Text-to-sense tasks.
In this study, we propose that employing query group partitioning allows LLMs to focus on learning the thought processes specific to a single problem type.
arXiv Detail & Related papers (2024-09-21T09:33:14Z) - MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL [47.120862170230566]
Recent Text-to-yourself methods usually suffer from significant performance degradation on "huge" databases.<n>We introduce MAC, a novel Text-to-yourself LLM-based multi-agent collaborative framework.<n>In our framework, we leverage GPT-4 as the strong backbone for all agent tasks to determine the upper bound of our framework.<n>We then fine-tune an open-sourced instruction-followed model,sql-Llama, by leveraging Code 7B, to accomplish all tasks as GPT-4 does.
arXiv Detail & Related papers (2023-12-18T14:40:20Z) - Benchmarking and Improving Text-to-SQL Generation under Ambiguity [25.283118418288293]
We develop a novel benchmark called AmbiQT where each text is interpretable as two plausible SQLs due to lexical and/or structural ambiguity.
We propose LogicalBeam, a new decoding algorithm that navigates thesql logic space using a blend of plan-based template generation and constrained infilling.
arXiv Detail & Related papers (2023-10-20T17:00:53Z) - SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs)
With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses.
With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z) - UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems.
It is composed of publicly available text-to-domain datasets and 29K databases.
Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z) - XRICL: Cross-lingual Retrieval-Augmented In-Context Learning for
Cross-lingual Text-to-SQL Semantic Parsing [70.40401197026925]
In-context learning using large language models has recently shown surprising results for semantic parsing tasks.
This work introduces the XRICL framework, which learns to retrieve relevant English exemplars for a given query.
We also include global translation exemplars for a target language to facilitate the translation process for large language models.
arXiv Detail & Related papers (2022-10-25T01:33:49Z) - A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future
Directions [102.8606542189429]
The goal of text-to-corpora parsing is to convert a natural language (NL) question to its corresponding structured query language () based on the evidences provided by databases.
Deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output query.
arXiv Detail & Related papers (2022-08-29T14:24:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.