Related papers: SQLSpace: A Representation Space for Text-to-SQL to Discover and Mitigate Robustness Gaps

SQLSpace: A Representation Space for Text-to-SQL to Discover and Mitigate Robustness Gaps

URL: http://arxiv.org/abs/2510.27532v1
Date: Fri, 31 Oct 2025 15:05:11 GMT
Title: SQLSpace: A Representation Space for Text-to-SQL to Discover and Mitigate Robustness Gaps
Authors: Neha Srikanth, Victor Bursztyn, Puneet Mathur, Ani Nenkova,
Abstract summary: sqlSpace is a compact representation for text-to-examples derived with minimal human intervention.<n>It reveals compositional differences between benchmarks, exposes performance patterns obscured by accuracy alone, and supports modeling of query success.
Score: 23.866638742325502
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce SQLSpace, a human-interpretable, generalizable, compact representation for text-to-SQL examples derived with minimal human intervention. We demonstrate the utility of these representations in evaluation with three use cases: (i) closely comparing and contrasting the composition of popular text-to-SQL benchmarks to identify unique dimensions of examples they evaluate, (ii) understanding model performance at a granular level beyond overall accuracy scores, and (iii) improving model performance through targeted query rewriting based on learned correctness estimation. We show that SQLSpace enables analysis that would be difficult with raw examples alone: it reveals compositional differences between benchmarks, exposes performance patterns obscured by accuracy alone, and supports modeling of query success.

Related papers

Bridging Global Intent with Local Details: A Hierarchical Representation Approach for Semantic Validation in Text-to-SQL [30.78817492504152]
HERO is a hierarchical representation approach that integrates global intent and local details.<n>We employ a Nested Message Passing Neural Network (NMPNN) to capture inherent information in relational schema-guided semantics.<n>Our approach outperforms existing state-of-the-art methods, achieving an average 9.40% improvement of AUPRC and 12.35% of AUROC in identifying semantic inconsistencies.<n>It excels at detecting fine-grained semantic errors, provides large language models with more granular feedback, and ultimately enhances the reliability and interpretability of data querying platforms.
arXiv Detail & Related papers (2025-12-28T02:25:33Z)
Text-to-SQL as Dual-State Reasoning: Integrating Adaptive Context and Progressive Generation [54.53145282349042]
We introduce DSR-sourced, a textbfDual-textbfS textbfReasoning framework that models Text-to-context as an interaction between an adaptive context state and a progressive generation state.<n>Without any post-training or in-context examples, DSR-sourced achieves competitive performance, reaching 35.28% execution accuracy on Spider 2.0-Snow and 68.32% on BIRD development set.
arXiv Detail & Related papers (2025-11-26T13:52:50Z)
The Interpretability Analysis of the Model Can Bring Improvements to the Text-to-SQL Task [3.890033714780255]
We integrate model interpretability analysis with execution-guided strategy for semantic parsing of WHERE clauses.<n>Our model excels on the Wiki dataset, which is emblematic of single-table database query tasks.<n>Our hope is that this endeavor to enhance accuracy in processing basic database queries will offer fresh perspectives for research into handling complex queries.
arXiv Detail & Related papers (2025-08-12T11:24:16Z)
Confidence Estimation for Text-to-SQL in Large Language Models [5.5643498845134545]
We study the problem in the context of large language models (LLMs), where access to model weights and gradients is often constrained.<n>We explore both black-box and white-box confidence estimation strategies, evaluating their effectiveness on cross-domain benchmarks.
arXiv Detail & Related papers (2025-08-08T23:09:45Z)
Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning [10.731045939849125]
We focus on Text-to- semantic parsing from the perspective of retrieval-augmented generation. Motivated by challenges related to the size of commercial database schemata and the deployability of business intelligence solutions, we propose $textASTReS$ that dynamically retrieves input database information.
arXiv Detail & Related papers (2024-07-03T15:55:14Z)
Evaluating Cross-Domain Text-to-SQL Models and Benchmarks [7.388002745070808]
We study text-to- benchmarks and re-evaluate some of the top-performing models within these benchmarks. We find that attaining a perfect performance on these benchmarks is unfeasible due to the multiple interpretations that can be derived from the provided samples. A GPT4-based model surpasses the gold standard reference queries in the Spider benchmark in our human evaluation.
arXiv Detail & Related papers (2023-10-27T23:36:14Z)
SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs) With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses. With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z)
UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems. It is composed of publicly available text-to-domain datasets and 29K databases. Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z)
Error Detection for Text-to-SQL Semantic Parsing [18.068244400731366]
Modern text-to- semantics are often over-confident, casting doubt on their trustworthiness when deployed for real use. We propose a-independent error detection model for text-to- semantic parsing.
arXiv Detail & Related papers (2023-05-23T04:44:22Z)
SUN: Exploring Intrinsic Uncertainties in Text-to-SQL Parsers [61.48159785138462]
This paper aims to improve the performance of text-to-dependence by exploring the intrinsic uncertainties in the neural network based approaches (called SUN) Extensive experiments on five benchmark datasets demonstrate that our method significantly outperforms competitors and achieves new state-of-the-art results.
arXiv Detail & Related papers (2022-09-14T06:27:51Z)
Proton: Probing Schema Linking Information from Pre-trained Language Models for Text-to-SQL Parsing [66.55478402233399]
We propose a framework to elicit relational structures via a probing procedure based on Poincar'e distance metric. Compared with commonly-used rule-based methods for schema linking, we found that probing relations can robustly capture semantic correspondences. Our framework sets new state-of-the-art performance on three benchmarks.
arXiv Detail & Related papers (2022-06-28T14:05:25Z)
Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training [86.91380874390778]
We present Generation-Augmented Pre-training (GAP), that jointly learns representations of natural language utterances and table schemas by leveraging generation models to generate pre-train data. Based on experimental results, neural semantics that leverage GAP MODEL obtain new state-of-the-art results on both SPIDER and CRITERIA-TO-generative benchmarks.
arXiv Detail & Related papers (2020-12-18T15:53:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.