Related papers: ValueNet: A Natural Language-to-SQL System that Learns from Database Information

ValueNet: A Natural Language-to-SQL System that Learns from Database Information

URL: http://arxiv.org/abs/2006.00888v2
Date: Mon, 22 Feb 2021 09:31:01 GMT
Title: ValueNet: A Natural Language-to-SQL System that Learns from Database Information
Authors: Ursin Brunner and Kurt Stockinger
Abstract summary: Building natural language interfaces for databases has been a long-standing challenge. Recent focus of research has been on neural networks to tackle this challenge on complex datasets like Spider. We propose two end-to-end NL-to-end systems that incorporate values using the challenging Spider.
Score: 4.788755317132195
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Building natural language (NL) interfaces for databases has been a long-standing challenge for several decades. The major advantage of these so-called NL-to-SQL systems is that end-users can query complex databases without the need to know SQL or the underlying database schema. Due to significant advancements in machine learning, the recent focus of research has been on neural networks to tackle this challenge on complex datasets like Spider. Several recent NL-to-SQL systems achieve promising results on this dataset. However, none of the published systems, that provide either the source code or executable binaries, extract and incorporate values from the user questions for generating SQL statements. Thus, the practical use of these systems in a real-world scenario has not been sufficiently demonstrated yet. In this paper we propose ValueNet light and ValueNet -- two end-to-end NL-to-SQL systems that incorporate values using the challenging Spider dataset. The main idea of our approach is to use not only metadata information from the underlying database but also information on the base data as input for our neural network architecture. In particular, we propose a novel architecture sketch to extract values from a user question and come up with possible value candidates which are not explicitly mentioned in the question. We then use a neural model based on an encoder-decoder architecture to synthesize the SQL query. Finally, we evaluate our model on the Spider challenge using the Execution Accuracy metric, a more difficult metric than used by most participants of the challenge. Our experimental evaluation demonstrates that ValueNet light and ValueNet reach state-of-the-art results of 67% and 62% accuracy, respectively, for translating from NL to SQL whilst incorporating values.

Related papers

Bridging the Gap: Enabling Natural Language Queries for NoSQL Databases through Text-to-NoSQL Translation [25.638927795540454]
We introduce the Text-to-No task, which aims to convert natural language queries into accessible queries. To promote research in this area, we released a large-scale and open-source dataset for this task, named TEND (short interfaces for Text-to-No dataset) We also designed a SLM (Small Language Model)-assisted and RAG (Retrieval-augmented Generation)-assisted multi-step framework called SMART, which is specifically designed for Text-to-No conversion.
arXiv Detail & Related papers (2025-02-16T17:01:48Z)
Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows [64.94146689665628]
Spider 2.0 is an evaluation framework for real-world text-to-sql problems derived from enterprise-level database use cases. The databases in Spider 2.0 are sourced from real data applications, often containing over 1,000 columns and stored in local or cloud database systems such as BigQuery and Snowflake. We show that solving problems in Spider 2.0 frequently requires understanding and searching through database metadata, dialect documentation, and even project-levels.
arXiv Detail & Related papers (2024-11-12T12:52:17Z)
CodeS: Towards Building Open-source Language Models for Text-to-SQL [42.11113113574589]
We introduce CodeS, a series of pre-trained language models with parameters ranging from 1B to 15B. CodeS is a fully open language model, which achieves superior accuracy with much smaller parameter sizes. We conduct comprehensive evaluations on multiple datasets, including the widely used Spider benchmark.
arXiv Detail & Related papers (2024-02-26T07:00:58Z)
Evaluating the Data Model Robustness of Text-to-SQL Systems Based on Real User Queries [4.141402725050671]
This paper is the first in-depth evaluation of the data model robustness of Text-to-- systems in practice. It is based on a real-world deployment of FootballDB, a system that was deployed over a 9 month period in the context of the FIFA World Cup 2022. All of our data is based on real user questions that were asked live to the system. We manually labeled and translated a subset of these questions for three different data models.
arXiv Detail & Related papers (2024-02-13T10:28:57Z)
ScienceBenchmark: A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems [16.33799752421288]
We introduce ScienceBenchmark, a new complex NL-to- benchmark for three real-world, highly domain-specific databases. We show that our benchmark is highly challenging, as the top performing systems on Spider achieve a very low performance on our benchmark.
arXiv Detail & Related papers (2023-06-07T19:37:55Z)
SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs) With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses. With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z)
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs [89.68522473384522]
We present Bird, a big benchmark for large-scale database grounded in text-to-efficient tasks. Our emphasis on database values highlights the new challenges of dirty database contents. Even the most effective text-to-efficient models, i.e. ChatGPT, achieves only 40.08% in execution accuracy.
arXiv Detail & Related papers (2023-05-04T19:02:29Z)
A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions [102.8606542189429]
The goal of text-to-corpora parsing is to convert a natural language (NL) question to its corresponding structured query language () based on the evidences provided by databases. Deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output query.
arXiv Detail & Related papers (2022-08-29T14:24:13Z)
Deep Learning Driven Natural Languages Text to SQL Query Conversion: A Survey [2.309914459672557]
In this paper, we try to present a holistic overview of 24 recent neural network models studied in the last couple of years. We also give an overview of 11 datasets that are widely used to train models for TEXT2 technologies.
arXiv Detail & Related papers (2022-08-08T20:54:34Z)
"What Do You Mean by That?" A Parser-Independent Interactive Approach for Enhancing Text-to-SQL [49.85635994436742]
We include human in the loop and present a novel-independent interactive approach (PIIA) that interacts with users using multi-choice questions. PIIA is capable of enhancing the text-to-domain performance with limited interaction turns by using both simulation and human evaluation.
arXiv Detail & Related papers (2020-11-09T02:14:33Z)
Data Agnostic RoBERTa-based Natural Language to SQL Query Generation [0.0]
The NL2 task aims at finding deep learning approaches to solve the problem converting by natural language questions into valid queries. We have presented an approach with data privacy at its core. Although we have not achieved state of the art results, we have eliminated the need for the table right from the training of the model.
arXiv Detail & Related papers (2020-10-11T13:18:46Z)
Photon: A Robust Cross-Domain Text-to-SQL System [189.1405317853752]
We present Photon, a robust, modular, cross-domain NLIDB that can flag natural language input to which a mapping cannot be immediately determined. The proposed method effectively improves the robustness of text-to-native system against untranslatable user input.
arXiv Detail & Related papers (2020-07-30T07:44:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.