Know What I don't Know: Handling Ambiguous and Unanswerable Questions
for Text-to-SQL
- URL: http://arxiv.org/abs/2212.08902v2
- Date: Fri, 19 May 2023 14:27:31 GMT
- Title: Know What I don't Know: Handling Ambiguous and Unanswerable Questions
for Text-to-SQL
- Authors: Bing Wang, Yan Gao, Zhoujun Li, Jian-Guang Lou
- Abstract summary: Existing text-to-yourselfs generate a "plausible" query for an arbitrary user question.
We propose a simple yet effective generation approach that automatically produces ambiguous and unanswerable examples.
Experimental results show that our model achieves the best result on both real-world examples and generated examples.
- Score: 36.5089235153207
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The task of text-to-SQL aims to convert a natural language question into its
corresponding SQL query within the context of relational tables. Existing
text-to-SQL parsers generate a "plausible" SQL query for an arbitrary user
question, thereby failing to correctly handle problematic user questions. To
formalize this problem, we conduct a preliminary study on the observed
ambiguous and unanswerable cases in text-to-SQL and summarize them into 6
feature categories. Correspondingly, we identify the causes behind each
category and propose requirements for handling ambiguous and unanswerable
questions. Following this study, we propose a simple yet effective
counterfactual example generation approach that automatically produces
ambiguous and unanswerable text-to-SQL examples. Furthermore, we propose a
weakly supervised DTE (Detecting-Then-Explaining) model for error detection,
localization, and explanation. Experimental results show that our model
achieves the best result on both real-world examples and generated examples
compared with various baselines. We release our data and code at:
\href{https://github.com/wbbeyourself/DTE}{https://github.com/wbbeyourself/DTE}.
Related papers
- PRACTIQ: A Practical Conversational Text-to-SQL dataset with Ambiguous and Unanswerable Queries [32.40808001281668]
Real user questions can often be ambiguous with multiple interpretations or unanswerable due to a lack of relevant data.
In this work, we construct a practical conversational text-to-text dataset.
We generate conversations with four turns: the initial user question, an assistant response seeking clarification, the user's clarification, and the assistant's clarified.
arXiv Detail & Related papers (2024-10-14T20:36:35Z) - Decoupling SQL Query Hardness Parsing for Text-to-SQL [2.30258928355895]
We introduce an innovative framework for Text-to-coupled based on decoupling query hardness parsing.
This framework decouples the Text-to-couple task based on query hardness by analyzing questions and schemas, simplifying the multi-hardness task into a single-hardness challenge.
arXiv Detail & Related papers (2023-12-11T07:20:46Z) - Retrieval-augmented GPT-3.5-based Text-to-SQL Framework with
Sample-aware Prompting and Dynamic Revision Chain [21.593701177605652]
We propose a Text-to-aware prompting framework, involving a sample and a dynamic revision chain.
Our approach incorporates sample demonstrations and fine-grained information related to the given question.
To generate executable and accuratesqls without human intervention, we design a dynamic revision chain which iteratively adapts fine-grained feedback.
arXiv Detail & Related papers (2023-07-11T07:16:22Z) - UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems.
It is composed of publicly available text-to-domain datasets and 29K databases.
Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z) - Prompting GPT-3.5 for Text-to-SQL with De-semanticization and Skeleton
Retrieval [17.747079214502673]
Text-to- is a task that converts a natural language question into a structured query language () to retrieve information from a database.
In this paper, we propose an LLM-based framework for Text-to- which retrieves helpful demonstration examples to prompt LLMs.
We design a de-semanticization mechanism that extracts question skeletons, allowing us to retrieve similar examples based on their structural similarity.
arXiv Detail & Related papers (2023-04-26T06:02:01Z) - Towards Generalizable and Robust Text-to-SQL Parsing [77.18724939989647]
We propose a novel TKK framework consisting of Task decomposition, Knowledge acquisition, and Knowledge composition to learn text-to- parsing in stages.
We show that our framework is effective in all scenarios and state-of-the-art performance on the Spider, SParC, and Co. datasets.
arXiv Detail & Related papers (2022-10-23T09:21:27Z) - A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future
Directions [102.8606542189429]
The goal of text-to-corpora parsing is to convert a natural language (NL) question to its corresponding structured query language () based on the evidences provided by databases.
Deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output query.
arXiv Detail & Related papers (2022-08-29T14:24:13Z) - Weakly Supervised Text-to-SQL Parsing through Question Decomposition [53.22128541030441]
We take advantage of the recently proposed question meaning representation called QDMR.
Given questions, their QDMR structures (annotated by non-experts or automatically predicted) and the answers, we are able to automatically synthesizesql queries.
Our results show that the weakly supervised models perform competitively with those trained on NL- benchmark data.
arXiv Detail & Related papers (2021-12-12T20:02:42Z) - Dual Reader-Parser on Hybrid Textual and Tabular Evidence for Open
Domain Question Answering [78.9863753810787]
A large amount of world's knowledge is stored in structured databases.
query languages can answer questions that require complex reasoning, as well as offering full explainability.
arXiv Detail & Related papers (2021-08-05T22:04:13Z) - Did You Ask a Good Question? A Cross-Domain Question Intention
Classification Benchmark for Text-to-SQL [32.946103197082124]
Triage is the first cross-domain text-to-question classification benchmark.
It requires models to distinguish four types of unanswerable questions from answerable questions.
The baseline RoBERTa model achieves a 60% F1 score on the test set.
arXiv Detail & Related papers (2020-10-23T19:36:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.