TailorSQL: An NL2SQL System Tailored to Your Query Workload
- URL: http://arxiv.org/abs/2505.23039v1
- Date: Thu, 29 May 2025 03:27:22 GMT
- Title: TailorSQL: An NL2SQL System Tailored to Your Query Workload
- Authors: Kapil Vaidya, Jialin Ding, Sebastian Kosak, David Kernert, Chuan Lei, Xiao Qin, Abhinav Tripathy, Ramesh Balan, Balakrishnan Narayanaswamy, Tim Kraska,
- Abstract summary: State-of-the-art NL2 techniques typically perform translation by retrieving database-specific information.<n>We introduce Tailor, a NL2 system that takes advantage of information in the past query workload.<n> Tailor achieves up to 2$times$ improvement in execution accuracy on standardized benchmarks.
- Score: 16.48291142955493
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: NL2SQL (natural language to SQL) translates natural language questions into SQL queries, thereby making structured data accessible to non-technical users, serving as the foundation for intelligent data applications. State-of-the-art NL2SQL techniques typically perform translation by retrieving database-specific information, such as the database schema, and invoking a pre-trained large language model (LLM) using the question and retrieved information to generate the SQL query. However, existing NL2SQL techniques miss a key opportunity which is present in real-world settings: NL2SQL is typically applied on existing databases which have already served many SQL queries in the past. The past query workload implicitly contains information which is helpful for accurate NL2SQL translation and is not apparent from the database schema alone, such as common join paths and the semantics of obscurely-named tables and columns. We introduce TailorSQL, a NL2SQL system that takes advantage of information in the past query workload to improve both the accuracy and latency of translating natural language questions into SQL. By specializing to a given workload, TailorSQL achieves up to 2$\times$ improvement in execution accuracy on standardized benchmarks.
Related papers
- ODIN: A NL2SQL Recommender to Handle Schema Ambiguity [21.483551391764944]
ODIN generates queries based on different interpretations of ambiguous schema components.<n>Our evaluation shows that ODIN improves the likelihood of generating the correct query by 1.5-2$times$ compared to baselines.
arXiv Detail & Related papers (2025-05-25T20:22:32Z) - Text-to-SQL Domain Adaptation via Human-LLM Collaborative Data Annotation [26.834687657847454]
Text-to-sql models are increasingly adopted in real-world applications.<n> deploying such models in the real world often requires adapting them to the highly specialized database schemas used in specific applications.<n>We find that existing text-to-sql models experience significant performance drops when applied to new schemas.<n> Continuously obtaining high-quality text-to-sql data for evolving schemas is prohibitively expensive in real-world scenarios.
arXiv Detail & Related papers (2025-02-21T22:32:35Z) - SQLPrompt: In-Context Text-to-SQL with Minimal Labeled Data [54.69489315952524]
"Prompt" is designed to improve the few-shot prompting capabilities of Text-to-LLMs.
"Prompt" outperforms previous approaches for in-context learning with few labeled data by a large margin.
We show that emphPrompt outperforms previous approaches for in-context learning with few labeled data by a large margin.
arXiv Detail & Related papers (2023-11-06T05:24:06Z) - SQLformer: Deep Auto-Regressive Query Graph Generation for Text-to-SQL Translation [16.07396492960869]
We introduce a novel Transformer architecture specifically crafted to perform text-to-gressive translation tasks.
Our model predicts queries as abstract syntax trees (ASTs) in an autore way, incorporating structural inductive bias in the executable and decoder layers.
arXiv Detail & Related papers (2023-10-27T00:13:59Z) - UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems.
It is composed of publicly available text-to-domain datasets and 29K databases.
Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z) - Can LLM Already Serve as A Database Interface? A BIg Bench for
Large-Scale Database Grounded Text-to-SQLs [89.68522473384522]
We present Bird, a big benchmark for large-scale database grounded in text-to-efficient tasks.
Our emphasis on database values highlights the new challenges of dirty database contents.
Even the most effective text-to-efficient models, i.e. ChatGPT, achieves only 40.08% in execution accuracy.
arXiv Detail & Related papers (2023-05-04T19:02:29Z) - XRICL: Cross-lingual Retrieval-Augmented In-Context Learning for
Cross-lingual Text-to-SQL Semantic Parsing [70.40401197026925]
In-context learning using large language models has recently shown surprising results for semantic parsing tasks.
This work introduces the XRICL framework, which learns to retrieve relevant English exemplars for a given query.
We also include global translation exemplars for a target language to facilitate the translation process for large language models.
arXiv Detail & Related papers (2022-10-25T01:33:49Z) - A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future
Directions [102.8606542189429]
The goal of text-to-corpora parsing is to convert a natural language (NL) question to its corresponding structured query language () based on the evidences provided by databases.
Deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output query.
arXiv Detail & Related papers (2022-08-29T14:24:13Z) - Weakly Supervised Text-to-SQL Parsing through Question Decomposition [53.22128541030441]
We take advantage of the recently proposed question meaning representation called QDMR.
Given questions, their QDMR structures (annotated by non-experts or automatically predicted) and the answers, we are able to automatically synthesizesql queries.
Our results show that the weakly supervised models perform competitively with those trained on NL- benchmark data.
arXiv Detail & Related papers (2021-12-12T20:02:42Z) - Relation Aware Semi-autoregressive Semantic Parsing for NL2SQL [17.605904256822786]
We present a Relation aware Semi-autogressive Semantic Parsing (MODN) framework, which is more adaptable for NL2 backbone.
From empirical results and case study, our model shows its effectiveness in learning better word representation in NL2.
arXiv Detail & Related papers (2021-08-02T12:21:08Z) - Data Agnostic RoBERTa-based Natural Language to SQL Query Generation [0.0]
The NL2 task aims at finding deep learning approaches to solve the problem converting by natural language questions into valid queries.
We have presented an approach with data privacy at its core.
Although we have not achieved state of the art results, we have eliminated the need for the table right from the training of the model.
arXiv Detail & Related papers (2020-10-11T13:18:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.