ErrorLLM: Modeling SQL Errors for Text-to-SQL Refinement
- URL: http://arxiv.org/abs/2603.03742v1
- Date: Wed, 04 Mar 2026 05:27:20 GMT
- Title: ErrorLLM: Modeling SQL Errors for Text-to-SQL Refinement
- Authors: Zijin Hong, Hao Chen, Zheng Yuan, Qinggang Zhang, Luyao Zhuang, Qing Liao, Feiran Huang, Yangqiu Song, Xiao Huang,
- Abstract summary: We propose ErrorLLM, a framework that explicitly models text-to- querying.<n>We show that ErrorLLM achieves the most significant improvements over backbone initial generation.<n>ErrorLLM addresses both sides by high detection F1 score while maintaining refinement effectiveness.
- Score: 57.98138819417949
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the remarkable performance of large language models (LLMs) in text-to-SQL (SQL generation), correctly producing SQL queries remains challenging during initial generation. The SQL refinement task is subsequently introduced to correct syntactic and semantic errors in generated SQL queries. However, existing paradigms face two major limitations: (i) self-debugging becomes increasingly ineffective as modern LLMs rarely produce explicit execution errors that can trigger debugging signals; (ii) self-correction exhibits low detection precision due to the lack of explicit error modeling grounded in the question and schema, and suffers from severe hallucination that frequently corrupts correct SQLs. In this paper, we propose ErrorLLM, a framework that explicitly models text-to-SQL Errors within a dedicated LLM for text-to-SQL refinement. Specifically, we represent the user question and database schema as structural features, employ static detection to identify execution failures and surface mismatches, and extend ErrorLLM's semantic space with dedicated error tokens that capture categorized implicit semantic error types. Through a well-designed training strategy, we explicitly model these errors with structural representations, enabling the LLM to detect complex implicit errors by predicting dedicated error tokens. Guided by the detected errors, we perform error-guided refinement on the SQL structure by prompting LLMs. Extensive experiments demonstrate that ErrorLLM achieves the most significant improvements over backbone initial generation. Further analysis reveals that detection quality directly determines refinement effectiveness, and ErrorLLM addresses both sides by high detection F1 score while maintain refinement effectiveness.
Related papers
- Hallucination Detection for LLM-based Text-to-SQL Generation via Two-Stage Metamorphic Testing [8.942002314582789]
Large language models (LLMs) generate hallucinations, i.e.,unrealistic or illogical content.<n>We propose a novel hallucination detection method based on metamorphic testing (MT) that does not require standard answers.<n>Trials demonstrate our method's superior performance in terms of the F1-score, which ranges from 69.36% to 82.76%.
arXiv Detail & Related papers (2025-12-24T04:04:26Z) - SQLens: An End-to-End Framework for Error Detection and Correction in Text-to-SQL [20.93676525997898]
We propose an end-to-end framework for fine-grained detection and correction of semantic errors in large language models (LLMs) generated by text-to-the-box systems.<n>We show that our framework outperforms the best LLM-based self-evaluation method by 25.78% in F1 for error detection, and improves execution accuracy of out-of-the-box systems by up to 20%.
arXiv Detail & Related papers (2025-06-04T22:25:47Z) - SHARE: An SLM-based Hierarchical Action CorREction Assistant for Text-to-SQL [18.493226915913638]
We propose SHARE, an SLM-based Hierarchical Action corREction assistant for text-to-correction.<n> SHARE orchestrates three specialized Small Language Models (SLMs) in a sequential pipeline.<n> Experimental results demonstrate that SHARE effectively enhances self-correction capabilities while proving robust across various LLMs.
arXiv Detail & Related papers (2025-05-31T04:51:12Z) - SQLCritic: Correcting Text-to-SQL Generation via Clause-wise Critic [8.680252929322684]
We introduce a clause-wise critique generation task along with a benchmark,sqlCriticBench, which performs fine-grained error localization.<n>We also propose an automatically training dataset curation pipeline which annotates clause-wise critique at scale.
arXiv Detail & Related papers (2025-03-11T02:52:39Z) - Tgea: An error-annotated dataset and benchmark tasks for text generation from pretrained language models [57.758735361535486]
TGEA is an error-annotated dataset for text generation from pretrained language models (PLMs)<n>We create an error taxonomy to cover 24 types of errors occurring in PLM-generated sentences.<n>This is the first dataset with comprehensive annotations for PLM-generated texts.
arXiv Detail & Related papers (2025-03-06T09:14:02Z) - Error Classification of Large Language Models on Math Word Problems: A Dynamically Adaptive Framework [79.40678802098026]
Math Word Problems serve as a crucial benchmark for evaluating Large Language Models' reasoning abilities.<n>Current error classification methods rely on static and predefined categories.<n>We propose Error-Aware Prompting (EAP) that incorporates common error patterns as explicit guidance.
arXiv Detail & Related papers (2025-01-26T16:17:57Z) - ToolScan: A Benchmark for Characterizing Errors in Tool-Use LLMs [77.79172008184415]
TOOLSCAN is a new benchmark to identify error patterns in LLM output on tool-use tasks.<n>We show that even the most prominent LLMs exhibit these error patterns in their outputs.<n>Researchers can use these insights from TOOLSCAN to guide their error mitigation strategies.
arXiv Detail & Related papers (2024-11-20T18:56:22Z) - Enhancing Text-to-SQL Capabilities of Large Language Models via Domain Database Knowledge Injection [23.423794784621368]
Large Language Models (LLMs) face challenges due to schema issues and a lack of domain-specific database knowledge.<n>This paper introduces a method of knowledge injection to enhance LLMs' ability to understand contents by incorporating prior knowledge.
arXiv Detail & Related papers (2024-09-24T09:24:03Z) - DAC: Decomposed Automation Correction for Text-to-SQL [51.48239006107272]
We introduce De Automation Correction (DAC), which corrects text-to-composed by decomposing entity linking and skeleton parsing.
We show that our method improves performance by $3.7%$ on average of Spider, Bird, and KaggleDBQA compared with the baseline method.
arXiv Detail & Related papers (2024-08-16T14:43:15Z) - Synthesizing Text-to-SQL Data from Weak and Strong LLMs [68.69270834311259]
The capability gap between open-source and closed-source large language models (LLMs) remains a challenge in text-to- tasks.
We introduce a synthetic data approach that combines data produced by larger, more powerful models with error information data generated by smaller, not well-aligned models.
arXiv Detail & Related papers (2024-08-06T15:40:32Z) - Wav2SQL: Direct Generalizable Speech-To-SQL Parsing [55.10009651476589]
Speech-to-Spider (S2Spider) aims to convert spoken questions intosql queries given databases.
We propose the first direct speech-to-speaker parsing model Wav2 which avoids error compounding across cascaded systems.
Experimental results demonstrate that Wav2 avoids error compounding and achieves state-of-the-art results by up to 2.5% accuracy improvement over the baseline.
arXiv Detail & Related papers (2023-05-21T19:26:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.