Related papers: Wav2SQL: Direct Generalizable Speech-To-SQL Parsing

Wav2SQL: Direct Generalizable Speech-To-SQL Parsing

URL: http://arxiv.org/abs/2305.12552v1
Date: Sun, 21 May 2023 19:26:46 GMT
Title: Wav2SQL: Direct Generalizable Speech-To-SQL Parsing
Authors: Huadai Liu, Rongjie Huang, Jinzheng He, Gang Sun, Ran Shen, Xize Cheng, Zhou Zhao
Abstract summary: Speech-to-Spider (S2Spider) aims to convert spoken questions intosql queries given databases. We propose the first direct speech-to-speaker parsing model Wav2 which avoids error compounding across cascaded systems. Experimental results demonstrate that Wav2 avoids error compounding and achieves state-of-the-art results by up to 2.5% accuracy improvement over the baseline.
Score: 55.10009651476589
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Speech-to-SQL (S2SQL) aims to convert spoken questions into SQL queries given relational databases, which has been traditionally implemented in a cascaded manner while facing the following challenges: 1) model training is faced with the major issue of data scarcity, where limited parallel data is available; and 2) the systems should be robust enough to handle diverse out-of-domain speech samples that differ from the source data. In this work, we propose the first direct speech-to-SQL parsing model Wav2SQL which avoids error compounding across cascaded systems. Specifically, 1) to accelerate speech-driven SQL parsing research in the community, we release a large-scale and multi-speaker dataset MASpider; 2) leveraging the recent progress in the large-scale pre-training, we show that it alleviates the data scarcity issue and allow for direct speech-to-SQL parsing; and 3) we include the speech re-programming and gradient reversal classifier techniques to reduce acoustic variance and learned style-agnostic representation, improving generalization to unseen out-of-domain custom data. Experimental results demonstrate that Wav2SQL avoids error compounding and achieves state-of-the-art results by up to 2.5\% accuracy improvement over the baseline.

Related papers

SQL-Trail: Multi-Turn Reinforcement Learning with Interleaved Feedback for Text-to-SQL [20.49395306069103]
We introduce a multi-turn reinforcement learning (RL) agentic framework for Text-to-one generation.<n>Rather than producing a query in one shot, SQL-Trail interacts with the database environment and uses execution feedback to iteratively refine its predictions.<n>Our approach centers on two key ideas: (i) an adaptive turn-budget allocation mechanism that scales the agent's interaction depth to match question difficulty, and (ii) a composite reward panel that jointly incentivizessql correctness and efficient exploration.
arXiv Detail & Related papers (2026-01-25T05:16:52Z)
Memo-SQL: Structured Decomposition and Experience-Driven Self-Correction for Training-Free NL2SQL [23.966546153810764]
Existing NL2 systems rely on in-context learning with only correct examples.<n>We present Memo-correction, setting a new state of the art among open, zero-fine-tuning methods.
arXiv Detail & Related papers (2026-01-15T02:42:05Z)
Text-to-SQL as Dual-State Reasoning: Integrating Adaptive Context and Progressive Generation [54.53145282349042]
We introduce DSR-sourced, a textbfDual-textbfS textbfReasoning framework that models Text-to-context as an interaction between an adaptive context state and a progressive generation state.<n>Without any post-training or in-context examples, DSR-sourced achieves competitive performance, reaching 35.28% execution accuracy on Spider 2.0-Snow and 68.32% on BIRD development set.
arXiv Detail & Related papers (2025-11-26T13:52:50Z)
CRED-SQL: Enhancing Real-world Large Scale Database Text-to-SQL Parsing through Cluster Retrieval and Execution Description [15.080310729603466]
CRED- is a framework designed for large-scale databases that integrates Cluster Retrieval and Execution Description.<n>It bridges the gap between natural language questions (NLQs) and their correspondingsql queries.<n>CRED- achieves new state-of-git-the-art (SOTA) performance, validating its effectiveness and scalability.
arXiv Detail & Related papers (2025-08-18T09:43:07Z)
Arctic-Text2SQL-R1: Simple Rewards, Strong Reasoning in Text-to-SQL [35.21185734929167]
We present Arctic-Text2-R1, a reinforcement learning (RL) framework and model family designed to generate accurate, executablesql.<n>Our approach avoids curated intermediate supervision and complex reward shaping, promoting stable training and alignment with the end task.<n> Notably, our 7B model outperforms prior 70B-class systems, highlighting the framework's scalability and efficiency.
arXiv Detail & Related papers (2025-05-22T23:33:47Z)
SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging [30.306023265985658]
We introduce a framework for generating high-quality synthetic training data for any dialect. We propose a novel Mixture-of-Experts (MoE) that leverages the shared knowledge across dialects.
arXiv Detail & Related papers (2024-08-22T20:50:48Z)
SQLPrompt: In-Context Text-to-SQL with Minimal Labeled Data [54.69489315952524]
"Prompt" is designed to improve the few-shot prompting capabilities of Text-to-LLMs. "Prompt" outperforms previous approaches for in-context learning with few labeled data by a large margin. We show that emphPrompt outperforms previous approaches for in-context learning with few labeled data by a large margin.
arXiv Detail & Related papers (2023-11-06T05:24:06Z)
SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs) With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses. With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z)
SPSQL: Step-by-step Parsing Based Framework for Text-to-SQL Generation [13.196264569882777]
The current mainstream end-to-end Text2 model is not only difficult to build due to its complex structure and high requirements for training data, but also difficult to adjust due to massive parameters. This paper proposes a pipeline method: SP Experiments to achieve the desired result. We construct the dataset based on the marketing business data of the State Grid Corporation of China.
arXiv Detail & Related papers (2023-05-10T10:01:36Z)
N-Best Hypotheses Reranking for Text-To-SQL Systems [6.966624873109535]
Text-to- task maps natural language utterances to structured queries. State-of-the-art (SOTA) systems rely on finetuning large, pre-trained language models. Findings show significant potential improvements with reranking.
arXiv Detail & Related papers (2022-10-19T15:35:06Z)
SUN: Exploring Intrinsic Uncertainties in Text-to-SQL Parsers [61.48159785138462]
This paper aims to improve the performance of text-to-dependence by exploring the intrinsic uncertainties in the neural network based approaches (called SUN) Extensive experiments on five benchmark datasets demonstrate that our method significantly outperforms competitors and achieves new state-of-the-art results.
arXiv Detail & Related papers (2022-09-14T06:27:51Z)
A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions [102.8606542189429]
The goal of text-to-corpora parsing is to convert a natural language (NL) question to its corresponding structured query language () based on the evidences provided by databases. Deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output query.
arXiv Detail & Related papers (2022-08-29T14:24:13Z)
S$^2$SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers [66.78665327694625]
We propose S$2$, injecting Syntax to question- encoder graph for Text-to- relational parsing. We also employ the decoupling constraint to induce diverse edge embedding, which further improves the network's performance. Experiments on the Spider and robustness setting Spider-Syn demonstrate that the proposed approach outperforms all existing methods when pre-training models are used.
arXiv Detail & Related papers (2022-03-14T09:49:15Z)
Speech-to-SQL: Towards Speech-driven SQL Query Generation From Natural Language Question [18.40290951253122]
Speech-based inputs have been gaining significant momentum with the popularity of smartphones and tablets. This paper works towards designing more effective speech interfaces to query the structured data databases. We propose a novel end-to-end neural architecture named SpeechNet to directly translate human speech into queries.
arXiv Detail & Related papers (2022-01-04T15:38:36Z)
Decoupled Dialogue Modeling and Semantic Parsing for Multi-Turn Text-to-SQL [20.92732277474218]
We propose a novel decoupled multi-turn Text-to-end framework, where an utterance rewrite model first explicitly solves completion of dialogue context. A dual learning approach is also proposed for the utterance rewrite model to address the data sparsity problem. With just a few rewrite cases, the decoupled method outperforms the released state-of-the-art end-to-end models on both SParC and Co datasets.
arXiv Detail & Related papers (2021-06-04T06:31:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.