KaggleDBQA: Realistic Evaluation of Text-to-SQL Parsers
- URL: http://arxiv.org/abs/2106.11455v1
- Date: Tue, 22 Jun 2021 00:08:03 GMT
- Title: KaggleDBQA: Realistic Evaluation of Text-to-SQL Parsers
- Authors: Chia-Hsuan Lee, Oleksandr Polozov, Matthew Richardson
- Abstract summary: We present KDBaggleQA, a new cross-domain evaluation dataset of real Web databases.
We show that KDBaggleQA presents a challenge to state-of-the-art zero-shots but that a more realistic evaluation setting and creative use of associated database documentation boosts their accuracy by over 13.2%.
- Score: 26.15889661083109
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of database question answering is to enable natural language
querying of real-life relational databases in diverse application domains.
Recently, large-scale datasets such as Spider and WikiSQL facilitated novel
modeling techniques for text-to-SQL parsing, improving zero-shot generalization
to unseen databases. In this work, we examine the challenges that still prevent
these techniques from practical deployment. First, we present KaggleDBQA, a new
cross-domain evaluation dataset of real Web databases, with domain-specific
data types, original formatting, and unrestricted questions. Second, we
re-examine the choice of evaluation tasks for text-to-SQL parsers as applied in
real-life settings. Finally, we augment our in-domain evaluation task with
database documentation, a naturally occurring source of implicit domain
knowledge. We show that KaggleDBQA presents a challenge to state-of-the-art
zero-shot parsers but a more realistic evaluation setting and creative use of
associated database documentation boosts their accuracy by over 13.2%, doubling
their performance.
Related papers
- CodeS: Towards Building Open-source Language Models for Text-to-SQL [42.11113113574589]
We introduce CodeS, a series of pre-trained language models with parameters ranging from 1B to 15B.
CodeS is a fully open language model, which achieves superior accuracy with much smaller parameter sizes.
We conduct comprehensive evaluations on multiple datasets, including the widely used Spider benchmark.
arXiv Detail & Related papers (2024-02-26T07:00:58Z) - Evaluating the Data Model Robustness of Text-to-SQL Systems Based on Real User Queries [4.141402725050671]
This paper is the first in-depth evaluation of the data model robustness of Text-to-- systems in practice.
It is based on a real-world deployment of FootballDB, a system that was deployed over a 9 month period in the context of the FIFA World Cup 2022.
All of our data is based on real user questions that were asked live to the system. We manually labeled and translated a subset of these questions for three different data models.
arXiv Detail & Related papers (2024-02-13T10:28:57Z) - SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs)
With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses.
With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z) - UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems.
It is composed of publicly available text-to-domain datasets and 29K databases.
Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z) - Can LLM Already Serve as A Database Interface? A BIg Bench for
Large-Scale Database Grounded Text-to-SQLs [89.68522473384522]
We present Bird, a big benchmark for large-scale database grounded in text-to-efficient tasks.
Our emphasis on database values highlights the new challenges of dirty database contents.
Even the most effective text-to-efficient models, i.e. ChatGPT, achieves only 40.08% in execution accuracy.
arXiv Detail & Related papers (2023-05-04T19:02:29Z) - A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future
Directions [102.8606542189429]
The goal of text-to-corpora parsing is to convert a natural language (NL) question to its corresponding structured query language () based on the evidences provided by databases.
Deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output query.
arXiv Detail & Related papers (2022-08-29T14:24:13Z) - Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack
Exchange Data [3.06261471569622]
SEDE is a dataset with 12,023 pairs of utterances andsql queries collected from real usage on the Stack Exchange website.
We show that these pairs contain a variety of real-world challenges which were rarely reflected so far in any other semantic parsing dataset.
arXiv Detail & Related papers (2021-06-09T12:09:51Z) - Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic
Parsing [110.97778888305506]
BRIDGE represents the question and DB schema in a tagged sequence where a subset of the fields are augmented with cell values mentioned in the question.
BRIDGE attained state-of-the-art performance on popular cross-DB text-to- relational benchmarks.
Our analysis shows that BRIDGE effectively captures the desired cross-modal dependencies and has the potential to generalize to more text-DB related tasks.
arXiv Detail & Related papers (2020-12-23T12:33:52Z) - "What Do You Mean by That?" A Parser-Independent Interactive Approach
for Enhancing Text-to-SQL [49.85635994436742]
We include human in the loop and present a novel-independent interactive approach (PIIA) that interacts with users using multi-choice questions.
PIIA is capable of enhancing the text-to-domain performance with limited interaction turns by using both simulation and human evaluation.
arXiv Detail & Related papers (2020-11-09T02:14:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.