Related papers: Valid Text-to-SQL Generation with Unification-based DeepStochLog

Valid Text-to-SQL Generation with Unification-based DeepStochLog

URL: http://arxiv.org/abs/2503.13342v1
Date: Mon, 17 Mar 2025 16:21:10 GMT
Title: Valid Text-to-SQL Generation with Unification-based DeepStochLog
Authors: Ying Jiao, Luc De Raedt, Giuseppe Marra,
Abstract summary: We propose a neurosymbolic framework that imposes syntax and schema constraints with unification-based definite clause grammars.<n>Our framework also builds a bi-directional interface to language models to leverage their natural language understanding abilities.<n>This work is the first step towards extending language models with unification-based grammars.
Score: 13.798222228959132
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models have been used to translate natural language questions to SQL queries. Without hard constraints on syntax and database schema, they occasionally produce invalid queries that are not executable. These failures limit the usage of these systems in real-life scenarios. We propose a neurosymbolic framework that imposes SQL syntax and schema constraints with unification-based definite clause grammars and thus guarantees the generation of valid queries. Our framework also builds a bi-directional interface to language models to leverage their natural language understanding abilities. The evaluation results on a subset of SQL grammars show that all our output queries are valid. This work is the first step towards extending language models with unification-based grammars. We demonstrate this extension enhances the validity, execution accuracy, and ground truth alignment of the underlying language model by a large margin. Our code is available at https://github.com/ML-KULeuven/deepstochlog-lm.

Related papers

End-to-End Text-to-SQL with Dataset Selection: Leveraging LLMs for Adaptive Query Generation [6.5390580456423555]
Traditional approaches model text-to- query as a direct translation task.<n>Recent advances in large language models (LLMs) have significantly improved translation accuracy.<n>We propose a three-stage end-to-end text-to-end framework to identify the user's intended database.
arXiv Detail & Related papers (2025-08-08T15:16:36Z)
ExeSQL: Self-Taught Text-to-SQL Models with Execution-Driven Bootstrapping for SQL Dialects [24.450818792474216]
This work introduces Exe, a text-to-guided framework with execution-driven, agentic bootstrapping.<n>We show that Exe bridges the dialect gap in text-to-guided learning, achieving average improvements of 15.2%, 10.38%, and 4.49% over GPT-4o on, and Oracle, respectively.
arXiv Detail & Related papers (2025-05-22T19:13:34Z)
Disambiguate First Parse Later: Generating Interpretations for Ambiguity Resolution in Semantic Parsing [56.82807063333088]
We propose a modular approach that resolves ambiguity using natural language interpretations before mapping these to logical forms. Our approach improves interpretation coverage and generalizes across datasets with different annotation styles, database structures, and ambiguity types.
arXiv Detail & Related papers (2025-02-25T18:42:26Z)
Understanding and Mitigating Language Confusion in LLMs [76.96033035093204]
We evaluate 15 typologically diverse languages with existing and newly-created English and multilingual prompts. We find that Llama Instruct and Mistral models exhibit high degrees of language confusion. We find that language confusion can be partially mitigated via few-shot prompting, multilingual SFT and preference tuning.
arXiv Detail & Related papers (2024-06-28T17:03:51Z)
TrustSQL: Benchmarking Text-to-SQL Reliability with Penalty-Based Scoring [11.78795632771211]
We introduce a novel benchmark designed to evaluate text-to- reliability as a model's ability to correctly handle any type of input question. We evaluate existing methods using a novel penalty-based scoring metric with two modeling approaches.
arXiv Detail & Related papers (2024-03-23T16:12:52Z)
SQLformer: Deep Auto-Regressive Query Graph Generation for Text-to-SQL Translation [16.07396492960869]
We introduce a novel Transformer architecture specifically crafted to perform text-to-gressive translation tasks. Our model predicts queries as abstract syntax trees (ASTs) in an autore way, incorporating structural inductive bias in the executable and decoder layers.
arXiv Detail & Related papers (2023-10-27T00:13:59Z)
UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems. It is composed of publicly available text-to-domain datasets and 29K databases. Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z)
Text-to-SQL Error Correction with Language Models of Code [24.743066730684742]
In this paper, we investigate how to build automatic text-to-corpora error correction models. Noticing that token-level edits are out of context and sometimes ambiguous, we propose building clause-level edit models instead.
arXiv Detail & Related papers (2023-05-22T14:42:39Z)
Prompting GPT-3.5 for Text-to-SQL with De-semanticization and Skeleton Retrieval [17.747079214502673]
Text-to- is a task that converts a natural language question into a structured query language () to retrieve information from a database. In this paper, we propose an LLM-based framework for Text-to- which retrieves helpful demonstration examples to prompt LLMs. We design a de-semanticization mechanism that extracts question skeletons, allowing us to retrieve similar examples based on their structural similarity.
arXiv Detail & Related papers (2023-04-26T06:02:01Z)
XRICL: Cross-lingual Retrieval-Augmented In-Context Learning for Cross-lingual Text-to-SQL Semantic Parsing [70.40401197026925]
In-context learning using large language models has recently shown surprising results for semantic parsing tasks. This work introduces the XRICL framework, which learns to retrieve relevant English exemplars for a given query. We also include global translation exemplars for a target language to facilitate the translation process for large language models.
arXiv Detail & Related papers (2022-10-25T01:33:49Z)
A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions [102.8606542189429]
The goal of text-to-corpora parsing is to convert a natural language (NL) question to its corresponding structured query language () based on the evidences provided by databases. Deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output query.
arXiv Detail & Related papers (2022-08-29T14:24:13Z)
Photon: A Robust Cross-Domain Text-to-SQL System [189.1405317853752]
We present Photon, a robust, modular, cross-domain NLIDB that can flag natural language input to which a mapping cannot be immediately determined. The proposed method effectively improves the robustness of text-to-native system against untranslatable user input.
arXiv Detail & Related papers (2020-07-30T07:44:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.