Wav2SQL: Direct Generalizable Speech-To-SQL Parsing
- URL: http://arxiv.org/abs/2305.12552v1
- Date: Sun, 21 May 2023 19:26:46 GMT
- Title: Wav2SQL: Direct Generalizable Speech-To-SQL Parsing
- Authors: Huadai Liu, Rongjie Huang, Jinzheng He, Gang Sun, Ran Shen, Xize
Cheng, Zhou Zhao
- Abstract summary: Speech-to-Spider (S2Spider) aims to convert spoken questions intosql queries given databases.
We propose the first direct speech-to-speaker parsing model Wav2 which avoids error compounding across cascaded systems.
Experimental results demonstrate that Wav2 avoids error compounding and achieves state-of-the-art results by up to 2.5% accuracy improvement over the baseline.
- Score: 55.10009651476589
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speech-to-SQL (S2SQL) aims to convert spoken questions into SQL queries given
relational databases, which has been traditionally implemented in a cascaded
manner while facing the following challenges: 1) model training is faced with
the major issue of data scarcity, where limited parallel data is available; and
2) the systems should be robust enough to handle diverse out-of-domain speech
samples that differ from the source data. In this work, we propose the first
direct speech-to-SQL parsing model Wav2SQL which avoids error compounding
across cascaded systems. Specifically, 1) to accelerate speech-driven SQL
parsing research in the community, we release a large-scale and multi-speaker
dataset MASpider; 2) leveraging the recent progress in the large-scale
pre-training, we show that it alleviates the data scarcity issue and allow for
direct speech-to-SQL parsing; and 3) we include the speech re-programming and
gradient reversal classifier techniques to reduce acoustic variance and learned
style-agnostic representation, improving generalization to unseen out-of-domain
custom data. Experimental results demonstrate that Wav2SQL avoids error
compounding and achieves state-of-the-art results by up to 2.5\% accuracy
improvement over the baseline.
Related papers
- SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging [30.306023265985658]
We introduce a framework for generating high-quality synthetic training data for any dialect.
We propose a novel Mixture-of-Experts (MoE) that leverages the shared knowledge across dialects.
arXiv Detail & Related papers (2024-08-22T20:50:48Z) - SQLPrompt: In-Context Text-to-SQL with Minimal Labeled Data [54.69489315952524]
"Prompt" is designed to improve the few-shot prompting capabilities of Text-to-LLMs.
"Prompt" outperforms previous approaches for in-context learning with few labeled data by a large margin.
We show that emphPrompt outperforms previous approaches for in-context learning with few labeled data by a large margin.
arXiv Detail & Related papers (2023-11-06T05:24:06Z) - SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs)
With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses.
With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z) - SPSQL: Step-by-step Parsing Based Framework for Text-to-SQL Generation [13.196264569882777]
The current mainstream end-to-end Text2 model is not only difficult to build due to its complex structure and high requirements for training data, but also difficult to adjust due to massive parameters.
This paper proposes a pipeline method: SP Experiments to achieve the desired result.
We construct the dataset based on the marketing business data of the State Grid Corporation of China.
arXiv Detail & Related papers (2023-05-10T10:01:36Z) - N-Best Hypotheses Reranking for Text-To-SQL Systems [6.966624873109535]
Text-to- task maps natural language utterances to structured queries.
State-of-the-art (SOTA) systems rely on finetuning large, pre-trained language models.
Findings show significant potential improvements with reranking.
arXiv Detail & Related papers (2022-10-19T15:35:06Z) - SUN: Exploring Intrinsic Uncertainties in Text-to-SQL Parsers [61.48159785138462]
This paper aims to improve the performance of text-to-dependence by exploring the intrinsic uncertainties in the neural network based approaches (called SUN)
Extensive experiments on five benchmark datasets demonstrate that our method significantly outperforms competitors and achieves new state-of-the-art results.
arXiv Detail & Related papers (2022-09-14T06:27:51Z) - A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future
Directions [102.8606542189429]
The goal of text-to-corpora parsing is to convert a natural language (NL) question to its corresponding structured query language () based on the evidences provided by databases.
Deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output query.
arXiv Detail & Related papers (2022-08-29T14:24:13Z) - S$^2$SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder
for Text-to-SQL Parsers [66.78665327694625]
We propose S$2$, injecting Syntax to question- encoder graph for Text-to- relational parsing.
We also employ the decoupling constraint to induce diverse edge embedding, which further improves the network's performance.
Experiments on the Spider and robustness setting Spider-Syn demonstrate that the proposed approach outperforms all existing methods when pre-training models are used.
arXiv Detail & Related papers (2022-03-14T09:49:15Z) - Speech-to-SQL: Towards Speech-driven SQL Query Generation From Natural
Language Question [18.40290951253122]
Speech-based inputs have been gaining significant momentum with the popularity of smartphones and tablets.
This paper works towards designing more effective speech interfaces to query the structured data databases.
We propose a novel end-to-end neural architecture named SpeechNet to directly translate human speech into queries.
arXiv Detail & Related papers (2022-01-04T15:38:36Z) - Decoupled Dialogue Modeling and Semantic Parsing for Multi-Turn
Text-to-SQL [20.92732277474218]
We propose a novel decoupled multi-turn Text-to-end framework, where an utterance rewrite model first explicitly solves completion of dialogue context.
A dual learning approach is also proposed for the utterance rewrite model to address the data sparsity problem.
With just a few rewrite cases, the decoupled method outperforms the released state-of-the-art end-to-end models on both SParC and Co datasets.
arXiv Detail & Related papers (2021-06-04T06:31:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.