Related papers: Text2SQL is Not Enough: Unifying AI and Databases with TAG

Text2SQL is Not Enough: Unifying AI and Databases with TAG

URL: http://arxiv.org/abs/2408.14717v1
Date: Tue, 27 Aug 2024 00:50:14 GMT
Title: Text2SQL is Not Enough: Unifying AI and Databases with TAG
Authors: Asim Biswal, Liana Patel, Siddarth Jha, Amog Kamsetty, Shu Liu, Joseph E. Gonzalez, Carlos Guestrin, Matei Zaharia,
Abstract summary: Table-Augmented Generation (TAG) is a paradigm for answering natural language questions over databases. We develop benchmarks to study the TAG problem and find that standard methods answer no more than 20% of queries correctly.
Score: 47.45480855418987
License: http://creativecommons.org/licenses/by/4.0/
Abstract: AI systems that serve natural language questions over databases promise to unlock tremendous value. Such systems would allow users to leverage the powerful reasoning and knowledge capabilities of language models (LMs) alongside the scalable computational power of data management systems. These combined capabilities would empower users to ask arbitrary natural language questions over custom data sources. However, existing methods and benchmarks insufficiently explore this setting. Text2SQL methods focus solely on natural language questions that can be expressed in relational algebra, representing a small subset of the questions real users wish to ask. Likewise, Retrieval-Augmented Generation (RAG) considers the limited subset of queries that can be answered with point lookups to one or a few data records within the database. We propose Table-Augmented Generation (TAG), a unified and general-purpose paradigm for answering natural language questions over databases. The TAG model represents a wide range of interactions between the LM and database that have been previously unexplored and creates exciting research opportunities for leveraging the world knowledge and reasoning capabilities of LMs over data. We systematically develop benchmarks to study the TAG problem and find that standard methods answer no more than 20% of queries correctly, confirming the need for further research in this area. We release code for the benchmark at https://github.com/TAG-Research/TAG-Bench.

Related papers

Text-to-SPARQL Goes Beyond English: Multilingual Question Answering Over Knowledge Graphs through Human-Inspired Reasoning [51.203811759364925]
mKGQAgent breaks down the task of converting natural language questions into SPARQL queries into modular, interpretable subtasks.<n> Evaluated on the DBpedia- and Corporate-based KGQA benchmarks within the Text2SPARQL challenge 2025, our approach took first place among the other participants.
arXiv Detail & Related papers (2025-07-22T19:23:03Z)
Knowledge Base Construction for Knowledge-Augmented Text-to-SQL [37.87911346522774]
We propose constructing a knowledge base for text-to-one, a foundational source of knowledge, from which we generate necessary knowledge for given queries.<n>Our knowledge base is comprehensive, which is constructed based on a combination of all available questions and their associated database schemas.<n>We validate our approach on multiple text-to-one datasets, considering both overlapping and non-overlapping database scenarios.
arXiv Detail & Related papers (2025-05-28T08:17:58Z)
Bridging the Gap: Enabling Natural Language Queries for NoSQL Databases through Text-to-NoSQL Translation [25.638927795540454]
We introduce the Text-to-No task, which aims to convert natural language queries into accessible queries. To promote research in this area, we released a large-scale and open-source dataset for this task, named TEND (short interfaces for Text-to-No dataset) We also designed a SLM (Small Language Model)-assisted and RAG (Retrieval-augmented Generation)-assisted multi-step framework called SMART, which is specifically designed for Text-to-No conversion.
arXiv Detail & Related papers (2025-02-16T17:01:48Z)
SM3-Text-to-Query: Synthetic Multi-Model Medical Text-to-Query Benchmark [4.049028351548513]
Different database models have a big impact on query complexity and performance. We present SM3-Text-to-Query, the first multi-model medical Text-to-Query benchmark.
arXiv Detail & Related papers (2024-11-08T12:27:13Z)
MoMQ: Mixture-of-Experts Enhances Multi-Dialect Query Generation across Relational and Non-Relational Databases [15.59894560371822]
Cloud service providers are looking for a unified database manager service that can support multiple dialects. MoMQ is a novel Mixture-of-Experts-based multi-dialect query generation framework. MoMQ employs a dialect expert group for each dialect and a multi-level routing strategy to handle dialect-specific knowledge.
arXiv Detail & Related papers (2024-10-24T03:42:43Z)
A System and Benchmark for LLM-based Q&A on Heterogeneous Data [17.73258512415368]
We introduce the siwarex platform, which enables seamless natural language access to both databases and APIs. Our modified Spider benchmark will soon be available to the research community.
arXiv Detail & Related papers (2024-09-09T15:44:39Z)
DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge. Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z)
QTSumm: Query-Focused Summarization over Tabular Data [58.62152746690958]
People primarily consult tables to conduct data analysis or answer specific questions. We define a new query-focused table summarization task, where text generation models have to perform human-like reasoning. We introduce a new benchmark named QTSumm for this task, which contains 7,111 human-annotated query-summary pairs over 2,934 tables.
arXiv Detail & Related papers (2023-05-23T17:43:51Z)
Querying Large Language Models with SQL [16.383179496709737]
In many use-cases, information is stored in text but not available in structured data. With the rise of pre-trained Large Language Models (LLMs), there is now an effective solution to store and use information extracted from massive corpora of text documents. We present Galois, a prototype based on a traditional database architecture, but with new physical operators for querying the underlying LLM.
arXiv Detail & Related papers (2023-04-02T06:58:14Z)
Semantic Parsing for Conversational Question Answering over Knowledge Graphs [63.939700311269156]
We develop a dataset where user questions are annotated with Sparql parses and system answers correspond to execution results thereof. We present two different semantic parsing approaches and highlight the challenges of the task. Our dataset and models are released at https://github.com/Edinburgh/SPICE.
arXiv Detail & Related papers (2023-01-28T14:45:11Z)
Dual Reader-Parser on Hybrid Textual and Tabular Evidence for Open Domain Question Answering [78.9863753810787]
A large amount of world's knowledge is stored in structured databases. query languages can answer questions that require complex reasoning, as well as offering full explainability.
arXiv Detail & Related papers (2021-08-05T22:04:13Z)
"What Do You Mean by That?" A Parser-Independent Interactive Approach for Enhancing Text-to-SQL [49.85635994436742]
We include human in the loop and present a novel-independent interactive approach (PIIA) that interacts with users using multi-choice questions. PIIA is capable of enhancing the text-to-domain performance with limited interaction turns by using both simulation and human evaluation.
arXiv Detail & Related papers (2020-11-09T02:14:33Z)
Data Agnostic RoBERTa-based Natural Language to SQL Query Generation [0.0]
The NL2 task aims at finding deep learning approaches to solve the problem converting by natural language questions into valid queries. We have presented an approach with data privacy at its core. Although we have not achieved state of the art results, we have eliminated the need for the table right from the training of the model.
arXiv Detail & Related papers (2020-10-11T13:18:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.