Related papers: Taming SQL Complexity: LLM-Based Equivalence Evaluation for Text-to-SQL

Taming SQL Complexity: LLM-Based Equivalence Evaluation for Text-to-SQL

URL: http://arxiv.org/abs/2506.09359v1
Date: Wed, 11 Jun 2025 03:16:39 GMT
Title: Taming SQL Complexity: LLM-Based Equivalence Evaluation for Text-to-SQL
Authors: Qingyun Zeng, Simin Ma, Arash Niknafs, Ashish Basran, Carol Szabo,
Abstract summary: This paper explores using Large Language Models (LLMs) to assess both semantic and a more practical "weak" semantic equivalence.<n>We analyze common patterns ofsql equivalence and inequivalence, discuss challenges in LLM-based evaluation.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rise of Large Language Models (LLMs) has significantly advanced Text-to-SQL (NL2SQL) systems, yet evaluating the semantic equivalence of generated SQL remains a challenge, especially given ambiguous user queries and multiple valid SQL interpretations. This paper explores using LLMs to assess both semantic and a more practical "weak" semantic equivalence. We analyze common patterns of SQL equivalence and inequivalence, discuss challenges in LLM-based evaluation.

Related papers

Evaluating and Enhancing LLMs for Multi-turn Text-to-SQL with Multiple Question Types [11.391598870596392]
Large language models (LLMs) have significantly advanced text-to-speech systems.<n>LLMs often narrowly focus on SQL generation, neglecting the complexities of real-world conversational queries.<n>We propose MM, a test suite designed to evaluate the question classification and SQL generation capabilities of LLMs.
arXiv Detail & Related papers (2024-12-21T10:13:45Z)
Can the Rookies Cut the Tough Cookie? Exploring the Use of LLMs for SQL Equivalence Checking [15.42143912008553]
We introduce a novel, realistic, and sufficiently complex benchmark called SQLEquiQuest for query equivalence checking.<n>We evaluate several state-of-the-art LLMs using various prompting strategies and carefully constructed in-context learning examples.<n>Our analysis shows that LLMs exhibit a strong bias for equivalence predictions, with consistently poor performance over non-equivalent pairs.
arXiv Detail & Related papers (2024-12-07T06:50:12Z)
Enhancing LLM Fine-tuning for Text-to-SQLs by SQL Quality Measurement [1.392448435105643]
Text-to-s enables non-expert users to effortlessly retrieve desired information from databases using natural language queries. Current state-of-the-art (SOTA) models like GPT4 and T5 have shown impressive performance on large-scale benchmarks like BIRD. This paper proposed a novel approach that only needs SQL Quality to enhance Text-to-s performance.
arXiv Detail & Related papers (2024-10-02T17:21:51Z)
PTD-SQL: Partitioning and Targeted Drilling with LLMs in Text-to-SQL [54.304872649870575]
Large Language Models (LLMs) have emerged as powerful tools for Text-to-sense tasks. In this study, we propose that employing query group partitioning allows LLMs to focus on learning the thought processes specific to a single problem type.
arXiv Detail & Related papers (2024-09-21T09:33:14Z)
RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL [48.516004807486745]
Large language models (LLMs) with in-context learning have significantly improved the performance of text-to- task. We propose RB-, a novel retrieval-based framework for in-context prompt engineering. Experiment results demonstrate that our model achieves better performance than several competitive baselines on public datasets BIRD and Spider.
arXiv Detail & Related papers (2024-07-11T08:19:58Z)
Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL [15.75829309721909]
Large language models (LLMs) have shown significant capabilities in natural language understanding as model scale increases.<n>LLMs can bring unique opportunities, improvements, and solutions to text-to- research.
arXiv Detail & Related papers (2024-06-12T17:13:17Z)
LLM-SQL-Solver: Can LLMs Determine SQL Equivalence? [7.59813709836711]
Large Language Models (LLMs) have shown strong reasoning capability in conversation, question answering and solving challenges.<n>To assist LLMs in generating high quality responses, we present two prompting techniques: Miniature & Mull and Explain & Compare.
arXiv Detail & Related papers (2023-12-16T05:01:23Z)
Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation [76.76046657162306]
Large language models (LLMs) have emerged as a new paradigm for Text-to- task. Large language models (LLMs) have emerged as a new paradigm for Text-to- task.
arXiv Detail & Related papers (2023-08-29T14:59:54Z)
SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs) With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses. With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z)
UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems. It is composed of publicly available text-to-domain datasets and 29K databases. Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z)
Weakly Supervised Text-to-SQL Parsing through Question Decomposition [53.22128541030441]
We take advantage of the recently proposed question meaning representation called QDMR. Given questions, their QDMR structures (annotated by non-experts or automatically predicted) and the answers, we are able to automatically synthesizesql queries. Our results show that the weakly supervised models perform competitively with those trained on NL- benchmark data.
arXiv Detail & Related papers (2021-12-12T20:02:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.