Text to Query Plans for Question Answering on Large Tables
- URL: http://arxiv.org/abs/2508.18758v1
- Date: Tue, 26 Aug 2025 07:35:26 GMT
- Title: Text to Query Plans for Question Answering on Large Tables
- Authors: Yipeng Zhang, Chen Wang, Yuzhe Zhang, Jacky Jiang,
- Abstract summary: We propose a novel framework that transforms natural language queries into query plans.<n>We enable complex analytical functions, such as principal component analysis and anomaly detection.<n>We validate our framework through experiments on both standard databases and large scientific tables.
- Score: 4.917892629916144
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Efficient querying and analysis of large tabular datasets remain significant challenges, especially for users without expertise in programming languages like SQL. Text-to-SQL approaches have shown promising performance on benchmark data; however, they inherit SQL's drawbacks, including inefficiency with large datasets and limited support for complex data analyses beyond basic querying. We propose a novel framework that transforms natural language queries into query plans. Our solution is implemented outside traditional databases, allowing us to support classical SQL commands while avoiding SQL's inherent limitations. Additionally, we enable complex analytical functions, such as principal component analysis and anomaly detection, providing greater flexibility and extensibility than traditional SQL capabilities. We leverage LLMs to iteratively interpret queries and construct operation sequences, addressing computational complexity by incrementally building solutions. By executing operations directly on the data, we overcome context length limitations without requiring the entire dataset to be processed by the model. We validate our framework through experiments on both standard databases and large scientific tables, demonstrating its effectiveness in handling extensive datasets and performing sophisticated data analyses.
Related papers
- Who Gets Cited Most? Benchmarking Long-Context Language Models on Scientific Articles [81.89404347890662]
SciTrek is a novel question-answering benchmark designed to evaluate the long-context reasoning capabilities of large language models (LLMs) using scientific articles.<n>Our analysis reveals systematic shortcomings in models' abilities to perform basic numerical operations and accurately locate specific information in long contexts.
arXiv Detail & Related papers (2025-09-25T11:36:09Z) - STARQA: A Question Answering Dataset for Complex Analytical Reasoning over Structured Databases [27.66819120859756]
We introduce STARQA, the first public human-created dataset of complex analytical reasoning questions and answers on three specialized relational-domain databases.<n>In this paper, we introduce STARQA, the first public human-created dataset of complex analytical reasoning questions and answers on three specialized relational-domain databases.
arXiv Detail & Related papers (2025-09-23T19:26:16Z) - HI-SQL: Optimizing Text-to-SQL Systems through Dynamic Hint Integration [1.3927943269211591]
Text-to-generation bridges the gap between natural language and databases, enabling users to query data without requiringsql expertise.<n>We propose HI-the, a pipeline that incorporates a novel hint generation mechanism utilizing historical query logs.<n>By analyzing prior queries, our method generates contextual hints that focus on handling the complexities of multi-table and nested operations.<n>Our approach significantly improves query accuracy of LLM-generated queries while ensuring efficiency in terms of calls and latency.
arXiv Detail & Related papers (2025-06-11T12:07:55Z) - LLM-Symbolic Integration for Robust Temporal Tabular Reasoning [69.27153114778748]
We introduce TempTabQA-C, a synthetic dataset designed for systematic and controlled evaluations.<n>This structured approach allows Large Language Models (LLMs) to generate and executesql queries, enhancing generalization and mitigating biases.
arXiv Detail & Related papers (2025-06-06T05:14:04Z) - Bridging the Gap: Enabling Natural Language Queries for NoSQL Databases through Text-to-NoSQL Translation [25.638927795540454]
We introduce the Text-to-No task, which aims to convert natural language queries into accessible queries.<n>To promote research in this area, we released a large-scale and open-source dataset for this task, named TEND (short interfaces for Text-to-No dataset)<n>We also designed a SLM (Small Language Model)-assisted and RAG (Retrieval-augmented Generation)-assisted multi-step framework called SMART, which is specifically designed for Text-to-No conversion.
arXiv Detail & Related papers (2025-02-16T17:01:48Z) - A Survey of Large Language Model-Based Generative AI for Text-to-SQL: Benchmarks, Applications, Use Cases, and Challenges [0.7889270818022226]
Text-to-one systems facilitate smooth interaction with databases by translating natural language queries into Structured Query Language (technical)<n>This survey provides an overview of the evolution of AI-driven text-to-one systems.<n>We examine the applications of text-to-one in domains like healthcare, education, and finance.
arXiv Detail & Related papers (2024-12-06T17:36:28Z) - E-SQL: Direct Schema Linking via Question Enrichment in Text-to-SQL [1.187832944550453]
We introduce E-Seek, a novel pipeline specifically designed to address these challenges through direct schema linking and candidate predicate augmentation.<n>E-Seek enhances the natural language query by incorporating relevant database items (i.e., tables, columns, and values) and conditions directly into the question andsql construction plan, bridging the gap between the query and the database structure.<n> Comprehensive evaluations illustrate that E-Seek achieves competitive performance, particularly excelling in complex queries with a 66.29% execution accuracy on the test set.
arXiv Detail & Related papers (2024-09-25T09:02:48Z) - RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL [48.516004807486745]
Large language models (LLMs) with in-context learning have significantly improved the performance of text-to- task.
We propose RB-, a novel retrieval-based framework for in-context prompt engineering.
Experiment results demonstrate that our model achieves better performance than several competitive baselines on public datasets BIRD and Spider.
arXiv Detail & Related papers (2024-07-11T08:19:58Z) - UQE: A Query Engine for Unstructured Databases [71.49289088592842]
We investigate the potential of Large Language Models to enable unstructured data analytics.
We propose a new Universal Query Engine (UQE) that directly interrogates and draws insights from unstructured data collections.
arXiv Detail & Related papers (2024-06-23T06:58:55Z) - Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation [76.76046657162306]
Large language models (LLMs) have emerged as a new paradigm for Text-to- task.
Large language models (LLMs) have emerged as a new paradigm for Text-to- task.
arXiv Detail & Related papers (2023-08-29T14:59:54Z) - SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs)
With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses.
With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.