Benchmarking the Text-to-SQL Capability of Large Language Models: A
Comprehensive Evaluation
- URL: http://arxiv.org/abs/2403.02951v2
- Date: Wed, 6 Mar 2024 08:43:17 GMT
- Title: Benchmarking the Text-to-SQL Capability of Large Language Models: A
Comprehensive Evaluation
- Authors: Bin Zhang, Yuxiao Ye, Guoqing Du, Xiaoru Hu, Zhishuai Li, Sun Yang,
Chi Harold Liu, Rui Zhao, Ziyue Li, Hangyu Mao
- Abstract summary: Large Language Models (LLMs) have emerged as a powerful tool in advancing the Text-to- task.
There is still no consensus on the optimal prompt templates and design frameworks.
Existing benchmarks inadequately explore the performance of LLMs across the various sub-tasks of the Text-to- process.
- Score: 33.41556606816004
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have emerged as a powerful tool in advancing the
Text-to-SQL task, significantly outperforming traditional methods.
Nevertheless, as a nascent research field, there is still no consensus on the
optimal prompt templates and design frameworks. Additionally, existing
benchmarks inadequately explore the performance of LLMs across the various
sub-tasks of the Text-to-SQL process, which hinders the assessment of LLMs'
cognitive capabilities and the optimization of LLM-based solutions. To address
the aforementioned issues, we firstly construct a new dataset designed to
mitigate the risk of overfitting in LLMs. Then we formulate five evaluation
tasks to comprehensively assess the performance of diverse methods across
various LLMs throughout the Text-to-SQL process.Our study highlights the
performance disparities among LLMs and proposes optimal in-context learning
solutions tailored to each task. These findings offer valuable insights for
enhancing the development of LLM-based Text-to-SQL systems.
Related papers
- EVOLvE: Evaluating and Optimizing LLMs For Exploration [76.66831821738927]
Large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty.
We measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications.
Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs.
arXiv Detail & Related papers (2024-10-08T17:54:03Z) - From Natural Language to SQL: Review of LLM-based Text-to-SQL Systems [1.1060425537315088]
This survey provides a comprehensive study of the evolution of LLM-based text-to-sql systems.
We discuss benchmarks, evaluation methods and evaluation metrics.
We highlight key challenges such as efficiency, model privacy, and data privacy with perspectives toward their development and improvements in potential areas.
arXiv Detail & Related papers (2024-10-01T20:46:25Z) - PTD-SQL: Partitioning and Targeted Drilling with LLMs in Text-to-SQL [54.304872649870575]
Large Language Models (LLMs) have emerged as powerful tools for Text-to-sense tasks.
In this study, we propose that employing query group partitioning allows LLMs to focus on learning the thought processes specific to a single problem type.
arXiv Detail & Related papers (2024-09-21T09:33:14Z) - Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases.
We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning.
Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z) - Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs)
We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios.
We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z) - Automated Commit Message Generation with Large Language Models: An Empirical Study and Beyond [24.151927600694066]
Commit Message Generation (CMG) approaches aim to automatically generate commit messages based on given code diffs.
This paper conducts the first comprehensive experiment to investigate how far we have been in applying Large Language Models (LLMs) to generate high-quality commit messages.
arXiv Detail & Related papers (2024-04-23T08:24:43Z) - Decomposition for Enhancing Attention: Improving LLM-based Text-to-SQL through Workflow Paradigm [19.06214756792692]
In-context learning of large-language models (LLMs) has achieved remarkable success in the field of natural language processing.
Case studies reveal that the single-step chain-of-thought approach faces challenges such as attention diffusion and inadequate performance in complex tasks like text-to-correction.
A workflow paradigm is proposed, aiming to enhance the attention and problem-solving scope of LLMs through decomposition.
arXiv Detail & Related papers (2024-02-16T13:24:05Z) - Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation [76.76046657162306]
Large language models (LLMs) have emerged as a new paradigm for Text-to- task.
Large language models (LLMs) have emerged as a new paradigm for Text-to- task.
arXiv Detail & Related papers (2023-08-29T14:59:54Z) - SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs)
With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses.
With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z) - Enhancing Few-shot Text-to-SQL Capabilities of Large Language Models: A
Study on Prompt Design Strategies [20.15851744895469]
In-context learning (ICL) has emerged as a new approach to various natural language processing tasks.
In this paper, we aim to extend this method to question answering tasks that utilize structured knowledge sources.
arXiv Detail & Related papers (2023-05-21T22:44:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.