Can LLM Already Serve as A Database Interface? A BIg Bench for
Large-Scale Database Grounded Text-to-SQLs
- URL: http://arxiv.org/abs/2305.03111v3
- Date: Wed, 15 Nov 2023 04:56:25 GMT
- Title: Can LLM Already Serve as A Database Interface? A BIg Bench for
Large-Scale Database Grounded Text-to-SQLs
- Authors: Jinyang Li, Binyuan Hui, Ge Qu, Jiaxi Yang, Binhua Li, Bowen Li,
Bailin Wang, Bowen Qin, Rongyu Cao, Ruiying Geng, Nan Huo, Xuanhe Zhou,
Chenhao Ma, Guoliang Li, Kevin C.C. Chang, Fei Huang, Reynold Cheng, Yongbin
Li
- Abstract summary: We present Bird, a big benchmark for large-scale database grounded in text-to-efficient tasks.
Our emphasis on database values highlights the new challenges of dirty database contents.
Even the most effective text-to-efficient models, i.e. ChatGPT, achieves only 40.08% in execution accuracy.
- Score: 89.68522473384522
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-SQL parsing, which aims at converting natural language instructions
into executable SQLs, has gained increasing attention in recent years. In
particular, Codex and ChatGPT have shown impressive results in this task.
However, most of the prevalent benchmarks, i.e., Spider, and WikiSQL, focus on
database schema with few rows of database contents leaving the gap between
academic study and real-world applications. To mitigate this gap, we present
Bird, a big benchmark for large-scale database grounded in text-to-SQL tasks,
containing 12,751 pairs of text-to-SQL data and 95 databases with a total size
of 33.4 GB, spanning 37 professional domains. Our emphasis on database values
highlights the new challenges of dirty database contents, external knowledge
between NL questions and database contents, and SQL efficiency, particularly in
the context of massive databases. To solve these problems, text-to-SQL models
must feature database value comprehension in addition to semantic parsing. The
experimental results demonstrate the significance of database values in
generating accurate text-to-SQLs for big databases. Furthermore, even the most
effective text-to-SQL models, i.e. ChatGPT, only achieves 40.08% in execution
accuracy, which is still far from the human result of 92.96%, proving that
challenges still stand. Besides, we also provide an efficiency analysis to
offer insights into generating text-to-efficient-SQLs that are beneficial to
industries. We believe that BIRD will contribute to advancing real-world
applications of text-to-SQL research. The leaderboard and source code are
available: https://bird-bench.github.io/.
Related papers
- Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows [64.94146689665628]
Spider 2.0 is an evaluation framework for real-world text-to-sql problems derived from enterprise-level database use cases.
The databases in Spider 2.0 are sourced from real data applications, often containing over 1,000 columns and stored in local or cloud database systems such as BigQuery and Snowflake.
We show that solving problems in Spider 2.0 frequently requires understanding and searching through database metadata, dialect documentation, and even project-levels.
arXiv Detail & Related papers (2024-11-12T12:52:17Z) - E-SQL: Direct Schema Linking via Question Enrichment in Text-to-SQL [1.187832944550453]
We introduce E- repository, a novel pipeline designed to address challenges through direct schema linking and candidate predicate augmentation.
E- enhances the natural language query by incorporating relevant database items (i.e. tables, columns, and values) and conditions directly into the question, bridging the gap between the query and the database structure.
We investigate the impact of schema filtering, a technique widely explored in previous work, and demonstrate its diminishing returns when applied alongside advanced large language models.
arXiv Detail & Related papers (2024-09-25T09:02:48Z) - SelECT-SQL: Self-correcting ensemble Chain-of-Thought for Text-to-SQL [3.422309388045878]
We introduce SelECT-, a novel in-context learning solution that uses an algorithmic combination of chain-of-thought, self-correction, and ensemble methods.
Specifically, when configured using GPT as the base LLM, SelECT-Turbo achieves 84.2% execution accuracy on the Spider leaderboard's development set.
arXiv Detail & Related papers (2024-09-16T05:40:18Z) - RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL [48.516004807486745]
Large language models (LLMs) with in-context learning have significantly improved the performance of text-to- task.
We propose RB-, a novel retrieval-based framework for in-context prompt engineering.
Experiment results demonstrate that our model achieves better performance than several competitive baselines on public datasets BIRD and Spider.
arXiv Detail & Related papers (2024-07-11T08:19:58Z) - ScienceBenchmark: A Complex Real-World Benchmark for Evaluating Natural
Language to SQL Systems [16.33799752421288]
We introduce ScienceBenchmark, a new complex NL-to- benchmark for three real-world, highly domain-specific databases.
We show that our benchmark is highly challenging, as the top performing systems on Spider achieve a very low performance on our benchmark.
arXiv Detail & Related papers (2023-06-07T19:37:55Z) - SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs)
With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses.
With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z) - UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems.
It is composed of publicly available text-to-domain datasets and 29K databases.
Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z) - A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future
Directions [102.8606542189429]
The goal of text-to-corpora parsing is to convert a natural language (NL) question to its corresponding structured query language () based on the evidences provided by databases.
Deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output query.
arXiv Detail & Related papers (2022-08-29T14:24:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.