Data Agnostic RoBERTa-based Natural Language to SQL Query Generation
- URL: http://arxiv.org/abs/2010.05243v3
- Date: Fri, 5 Mar 2021 05:55:10 GMT
- Title: Data Agnostic RoBERTa-based Natural Language to SQL Query Generation
- Authors: Debaditya Pal, Harsh Sharma, Kaustubh Chaudhari
- Abstract summary: The NL2 task aims at finding deep learning approaches to solve the problem converting by natural language questions into valid queries.
We have presented an approach with data privacy at its core.
Although we have not achieved state of the art results, we have eliminated the need for the table right from the training of the model.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Relational databases are among the most widely used architectures to store
massive amounts of data in the modern world. However, there is a barrier
between these databases and the average user. The user often lacks the
knowledge of a query language such as SQL required to interact with the
database. The NL2SQL task aims at finding deep learning approaches to solve
this problem by converting natural language questions into valid SQL queries.
Given the sensitive nature of some databases and the growing need for data
privacy, we have presented an approach with data privacy at its core. We have
passed RoBERTa embeddings and data-agnostic knowledge vectors into LSTM based
submodels to predict the final query. Although we have not achieved state of
the art results, we have eliminated the need for the table data, right from the
training of the model, and have achieved a test set execution accuracy of
76.7%. By eliminating the table data dependency while training we have created
a model capable of zero shot learning based on the natural language question
and table schema alone.
Related papers
- Table Question Answering for Low-resourced Indic Languages [71.57359949962678]
TableQA is the task of answering questions over tables of structured information, returning individual cells or tables as output.
We introduce a fully automatic large-scale tableQA data generation process for low-resource languages with limited budget.
We incorporate our data generation method on two Indic languages, Bengali and Hindi, which have no tableQA datasets or models.
arXiv Detail & Related papers (2024-10-04T16:26:12Z) - E-SQL: Direct Schema Linking via Question Enrichment in Text-to-SQL [1.187832944550453]
We introduce E- repository, a novel pipeline designed to address challenges through direct schema linking and candidate predicate augmentation.
E- enhances the natural language query by incorporating relevant database items (i.e. tables, columns, and values) and conditions directly into the question, bridging the gap between the query and the database structure.
We investigate the impact of schema filtering, a technique widely explored in previous work, and demonstrate its diminishing returns when applied alongside advanced large language models.
arXiv Detail & Related papers (2024-09-25T09:02:48Z) - Text2SQL is Not Enough: Unifying AI and Databases with TAG [47.45480855418987]
Table-Augmented Generation (TAG) is a paradigm for answering natural language questions over databases.
We develop benchmarks to study the TAG problem and find that standard methods answer no more than 20% of queries correctly.
arXiv Detail & Related papers (2024-08-27T00:50:14Z) - Can LLM Already Serve as A Database Interface? A BIg Bench for
Large-Scale Database Grounded Text-to-SQLs [89.68522473384522]
We present Bird, a big benchmark for large-scale database grounded in text-to-efficient tasks.
Our emphasis on database values highlights the new challenges of dirty database contents.
Even the most effective text-to-efficient models, i.e. ChatGPT, achieves only 40.08% in execution accuracy.
arXiv Detail & Related papers (2023-05-04T19:02:29Z) - A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future
Directions [102.8606542189429]
The goal of text-to-corpora parsing is to convert a natural language (NL) question to its corresponding structured query language () based on the evidences provided by databases.
Deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output query.
arXiv Detail & Related papers (2022-08-29T14:24:13Z) - Deep Learning Driven Natural Languages Text to SQL Query Conversion: A
Survey [2.309914459672557]
In this paper, we try to present a holistic overview of 24 recent neural network models studied in the last couple of years.
We also give an overview of 11 datasets that are widely used to train models for TEXT2 technologies.
arXiv Detail & Related papers (2022-08-08T20:54:34Z) - Towards a Natural Language Query Processing System [0.0]
This paper reports our study on the design and development of a natural language query interface to a backend relational database.
The novelty in the study lies in defining a graph database as a middle layer to store necessary metadata needed to transform a natural language query into structured query language.
The translation results for some sample queries yielded a 90% accuracy rate.
arXiv Detail & Related papers (2020-09-25T19:52:20Z) - Photon: A Robust Cross-Domain Text-to-SQL System [189.1405317853752]
We present Photon, a robust, modular, cross-domain NLIDB that can flag natural language input to which a mapping cannot be immediately determined.
The proposed method effectively improves the robustness of text-to-native system against untranslatable user input.
arXiv Detail & Related papers (2020-07-30T07:44:48Z) - TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data [113.29476656550342]
We present TaBERT, a pretrained LM that jointly learns representations for NL sentences and tables.
TaBERT is trained on a large corpus of 26 million tables and their English contexts.
Implementation of the model will be available at http://fburl.com/TaBERT.
arXiv Detail & Related papers (2020-05-17T17:26:40Z) - Recent Advances in SQL Query Generation: A Survey [0.0]
With the rise of deep learning techniques, there is extensive ongoing research in designing a suitable natural language interface to relational databases.
We describe models with various architectures such as convolutional neural networks, recurrent neural networks, pointer networks, reinforcement learning, etc.
arXiv Detail & Related papers (2020-05-15T17:31:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.