Did You Ask a Good Question? A Cross-Domain Question Intention
Classification Benchmark for Text-to-SQL
- URL: http://arxiv.org/abs/2010.12634v1
- Date: Fri, 23 Oct 2020 19:36:57 GMT
- Title: Did You Ask a Good Question? A Cross-Domain Question Intention
Classification Benchmark for Text-to-SQL
- Authors: Yusen Zhang, Xiangyu Dong, Shuaichen Chang, Tao Yu, Peng Shi and Rui
Zhang
- Abstract summary: Triage is the first cross-domain text-to-question classification benchmark.
It requires models to distinguish four types of unanswerable questions from answerable questions.
The baseline RoBERTa model achieves a 60% F1 score on the test set.
- Score: 32.946103197082124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural models have achieved significant results on the text-to-SQL task, in
which most current work assumes all the input questions are legal and generates
a SQL query for any input. However, in the real scenario, users can input any
text that may not be able to be answered by a SQL query. In this work, we
propose TriageSQL, the first cross-domain text-to-SQL question intention
classification benchmark that requires models to distinguish four types of
unanswerable questions from answerable questions. The baseline RoBERTa model
achieves a 60% F1 score on the test set, demonstrating the need for further
improvement on this task. Our dataset is available at
https://github.com/chatc/TriageSQL.
Related papers
- Decoupling SQL Query Hardness Parsing for Text-to-SQL [2.30258928355895]
We introduce an innovative framework for Text-to-coupled based on decoupling query hardness parsing.
This framework decouples the Text-to-couple task based on query hardness by analyzing questions and schemas, simplifying the multi-hardness task into a single-hardness challenge.
arXiv Detail & Related papers (2023-12-11T07:20:46Z) - UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems.
It is composed of publicly available text-to-domain datasets and 29K databases.
Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z) - Know What I don't Know: Handling Ambiguous and Unanswerable Questions
for Text-to-SQL [36.5089235153207]
Existing text-to-yourselfs generate a "plausible" query for an arbitrary user question.
We propose a simple yet effective generation approach that automatically produces ambiguous and unanswerable examples.
Experimental results show that our model achieves the best result on both real-world examples and generated examples.
arXiv Detail & Related papers (2022-12-17T15:32:00Z) - A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future
Directions [102.8606542189429]
The goal of text-to-corpora parsing is to convert a natural language (NL) question to its corresponding structured query language () based on the evidences provided by databases.
Deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output query.
arXiv Detail & Related papers (2022-08-29T14:24:13Z) - S$^2$SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder
for Text-to-SQL Parsers [66.78665327694625]
We propose S$2$, injecting Syntax to question- encoder graph for Text-to- relational parsing.
We also employ the decoupling constraint to induce diverse edge embedding, which further improves the network's performance.
Experiments on the Spider and robustness setting Spider-Syn demonstrate that the proposed approach outperforms all existing methods when pre-training models are used.
arXiv Detail & Related papers (2022-03-14T09:49:15Z) - Prefix-to-SQL: Text-to-SQL Generation from Incomplete User Questions [33.48258057604425]
We propose a new task, prefix-to-Query, which takes question prefix from users as the input and predicts the intendedsql.
We construct a new benchmark called PAGSAS that contains 124K user question prefixes and the intendedsql for 5 sub-tasks Advising, GeoQuery, Scholar, ATIS, and Spider.
As we observe the difficulty of prefix-to-Query is related to the number of omitted tokens, we incorporate curriculum learning of feeding examples with an increasing number of omitted tokens.
arXiv Detail & Related papers (2021-09-15T14:28:18Z) - Dual Reader-Parser on Hybrid Textual and Tabular Evidence for Open
Domain Question Answering [78.9863753810787]
A large amount of world's knowledge is stored in structured databases.
query languages can answer questions that require complex reasoning, as well as offering full explainability.
arXiv Detail & Related papers (2021-08-05T22:04:13Z) - Photon: A Robust Cross-Domain Text-to-SQL System [189.1405317853752]
We present Photon, a robust, modular, cross-domain NLIDB that can flag natural language input to which a mapping cannot be immediately determined.
The proposed method effectively improves the robustness of text-to-native system against untranslatable user input.
arXiv Detail & Related papers (2020-07-30T07:44:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.