Related papers: Evaluating and Enhancing LLMs for Multi-turn Text-to-SQL with Multiple Question Types

Evaluating and Enhancing LLMs for Multi-turn Text-to-SQL with Multiple Question Types

URL: http://arxiv.org/abs/2412.17867v4
Date: Tue, 08 Apr 2025 02:23:17 GMT
Title: Evaluating and Enhancing LLMs for Multi-turn Text-to-SQL with Multiple Question Types
Authors: Ziming Guo, Chao Ma, Yinggang Sun, Tiancheng Zhao, Guangyao Wang, Hai Huang,
Abstract summary: Large language models (LLMs) have significantly advanced text-to-speech systems.<n>LLMs often narrowly focus on SQL generation, neglecting the complexities of real-world conversational queries.<n>We propose MM, a test suite designed to evaluate the question classification and SQL generation capabilities of LLMs.
Score: 11.391598870596392
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in large language models (LLMs) have significantly advanced text-to-SQL systems. However, most LLM-based methods often narrowly focus on SQL generation, neglecting the complexities of real-world conversational queries. This oversight can lead to unreliable responses, particularly for ambiguous questions that cannot be directly addressed with SQL. To bridge this gap, we propose MMSQL, a comprehensive test suite designed to evaluate the question classification and SQL generation capabilities of LLMs by simulating real-world scenarios with diverse question types and multi-turn Q&A interactions. Using MMSQL, we assessed the performance of popular LLMs, including both open-source and closed-source models, and identified key factors impacting their performance in such scenarios. Moreover, we introduce an LLM-based multi-agent framework that employs specialized agents to identify question types and determine appropriate answering strategies. Our experiments demonstrate that this approach significantly enhances the model's ability to navigate the complexities of conversational dynamics, effectively handling the diverse and complex nature of user queries. Our dataset and code are publicly available at https://mcxiaoxiao.github.io/MMSQL.

Related papers

A Multi-agent Text2SQL Framework using Small Language Models and Execution Feedback [40.19592881059662]
Large Language Models (LLMs) have demonstrated superior performance for generating Text2sql queries.<n>Privacy and cost considerations prevent companies from using Text2 solutions based on external LLMs offered as a service.<n>We propose MATS, a novel Text2 framework designed specifically for SLMs.
arXiv Detail & Related papers (2025-12-21T06:43:47Z)
Exploring the Use of LLMs for SQL Equivalence Checking [15.42143912008553]
Equivalence checking of twosql queries is an intractable problem. Existing methods can handle only a small subset ofsql, even for bounded equivalence checking. This paper explores whether large language models (LLMs) can also demonstrate the ability to reason withsql queries.
arXiv Detail & Related papers (2024-12-07T06:50:12Z)
PTD-SQL: Partitioning and Targeted Drilling with LLMs in Text-to-SQL [54.304872649870575]
Large Language Models (LLMs) have emerged as powerful tools for Text-to-sense tasks. In this study, we propose that employing query group partitioning allows LLMs to focus on learning the thought processes specific to a single problem type.
arXiv Detail & Related papers (2024-09-21T09:33:14Z)
Relational Database Augmented Large Language Model [59.38841050766026]
Large language models (LLMs) excel in many natural language processing (NLP) tasks. They can only incorporate new knowledge through training or supervised fine-tuning processes. This precise, up-to-date, and private information is typically stored in relational databases.
arXiv Detail & Related papers (2024-07-21T06:19:10Z)
RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL [48.516004807486745]
Large language models (LLMs) with in-context learning have significantly improved the performance of text-to- task. We propose RB-, a novel retrieval-based framework for in-context prompt engineering. Experiment results demonstrate that our model achieves better performance than several competitive baselines on public datasets BIRD and Spider.
arXiv Detail & Related papers (2024-07-11T08:19:58Z)
Lucy: Think and Reason to Solve Text-to-SQL [12.52968634440807]
Large Language Models (LLMs) have made significant progress in assisting users to query databases in natural language. LLMs provide state-of-the-art results on many standard benchmarks, but their performance significantly drops when applied to large enterprise databases. We propose a new solution that combines the power of LLMs in understanding questions with automated reasoning techniques to handle complex database constraints.
arXiv Detail & Related papers (2024-07-06T18:56:42Z)
Multi-LLM QA with Embodied Exploration [55.581423861790945]
We investigate the use of Multi-Embodied LLM Explorers (MELE) for question-answering in an unknown environment. Multiple LLM-based agents independently explore and then answer queries about a household environment. We analyze different aggregation methods to generate a single, final answer for each query.
arXiv Detail & Related papers (2024-06-16T12:46:40Z)
Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL [15.75829309721909]
Generating accuratesql from natural language questions (text-to-) is a long-standing challenge. PLMs have been developed and utilized for text-to- tasks, achieving promising performance. Recently, large language models (LLMs) have demonstrated significant capabilities in natural language understanding.
arXiv Detail & Related papers (2024-06-12T17:13:17Z)
MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation [10.726734105960924]
Large language models (LLMs) have enabled in-context learning (ICL)-based methods that significantly outperform fine-tuning approaches for text-to- tasks. This study considers the sensitivity of LLMs to the prompts and introduces a novel approach that leverages multiple prompts to explore a broader search space for possible answers. We establish a new SOTA performance on the BIRD in terms of both the accuracy and efficiency of the generated queries.
arXiv Detail & Related papers (2024-05-13T04:59:32Z)
CoE-SQL: In-Context Learning for Multi-Turn Text-to-SQL with Chain-of-Editions [22.493487741249716]
Large Language Models (LLMs) have been demonstrated to possess impressive capabilities in a variety of domains and tasks. We investigate the issue of prompt design in the multi-turn text-to- task and attempt to enhance the LLMs' reasoning capacity.
arXiv Detail & Related papers (2024-05-04T16:56:14Z)
ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language Models [46.07900122810749]
Large language models (LLMs) have achieved unprecedented performances in various applications, yet evaluating them is still challenging. We contend that utilizing existing relational databases is a promising approach for constructing benchmarks. We propose ERBench, which uses these integrity constraints to convert any database into an LLM benchmark.
arXiv Detail & Related papers (2024-03-08T12:42:36Z)
On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering [25.57202500348071]
This study introduces a new long-form database question answering dataset designed to evaluate how Large Language Models interact with a database. The task requires LLMs to strategically generate multiplesql queries to retrieve sufficient data from a database, to reason with the acquired context, and to synthesize them into a comprehensive analytical narrative. We propose and evaluate two interaction strategies, and provide a fine-grained analysis of the individual stages within the interaction.
arXiv Detail & Related papers (2023-11-16T09:55:07Z)
Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation [76.76046657162306]
Large language models (LLMs) have emerged as a new paradigm for Text-to- task. Large language models (LLMs) have emerged as a new paradigm for Text-to- task.
arXiv Detail & Related papers (2023-08-29T14:59:54Z)
SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs) With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses. With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z)
Querying Large Language Models with SQL [16.383179496709737]
In many use-cases, information is stored in text but not available in structured data. With the rise of pre-trained Large Language Models (LLMs), there is now an effective solution to store and use information extracted from massive corpora of text documents. We present Galois, a prototype based on a traditional database architecture, but with new physical operators for querying the underlying LLM.
arXiv Detail & Related papers (2023-04-02T06:58:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.