Related papers: SeSQL: Yet Another Large-scale Session-level Chinese Text-to-SQL Dataset

SeSQL: Yet Another Large-scale Session-level Chinese Text-to-SQL Dataset

URL: http://arxiv.org/abs/2208.12711v1
Date: Fri, 26 Aug 2022 15:11:10 GMT
Title: SeSQL: Yet Another Large-scale Session-level Chinese Text-to-SQL Dataset
Authors: Saihao Huang, Lijie Wang, Zhenghua Li, Zeyang Liu, Chenhui Dou, Fukang Yan, Xinyan Xiao, Hua Wu, Min Zhang
Abstract summary: CHASE contains 2,003 sessions manually constructed from scratch (CHASE-C) and 3,456 sessions translated from English (CHASE-T) We find the two parts are highly discrepant and incompatible as training and evaluation data. In this work, we present Se, yet another large-scale session-level text-to- parsing dataset in Chinese, consisting of 5,028 sessions all manually constructed from scratch.
Score: 39.78074639729293
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As the first session-level Chinese dataset, CHASE contains two separate parts, i.e., 2,003 sessions manually constructed from scratch (CHASE-C), and 3,456 sessions translated from English SParC (CHASE-T). We find the two parts are highly discrepant and incompatible as training and evaluation data. In this work, we present SeSQL, yet another large-scale session-level text-to-SQL dataset in Chinese, consisting of 5,028 sessions all manually constructed from scratch. In order to guarantee data quality, we adopt an iterative annotation workflow to facilitate intense and in-time review of previous-round natural language (NL) questions and SQL queries. Moreover, by completing all context-dependent NL questions, we obtain 27,012 context-independent question/SQL pairs, allowing SeSQL to be used as the largest dataset for single-round multi-DB text-to-SQL parsing. We conduct benchmark session-level text-to-SQL parsing experiments on SeSQL by employing three competitive session-level parsers, and present detailed analysis.

Related papers

Text-to-SQL Oriented to the Process Mining Domain: A PT-EN Dataset for Query Translation [0.10499611180329804]
This paper introduces text-2--4-PM, a benchmark dataset for the text-to-four task in the process mining domain.<n>The dataset comprises 1,655 natural language utterances, including human-generated paraphrases, 205sql statements, and ten qualifiers.<n>The results show that text-2--4-PM supports evaluation of text-to-four implementations, offering broader applicability for semantic parsing and other natural language processing tasks.
arXiv Detail & Related papers (2025-08-18T01:25:41Z)
OmniSQL: Synthesizing High-quality Text-to-SQL Data at Scale [31.852909145101677]
We propose a novel and scalable text-to-data framework for automatically synthesizing large-scale, high-quality, and diverse datasets without extensive human intervention. We introduce Syn-2.5M, the first million-scale text-to-dataset, containing 2.5 million samples spanning over 16,000 synthetic databases. We develop Omni, a powerful open-source text-to-model available in three sizes: 7B, 14B, and 32B.
arXiv Detail & Related papers (2025-03-04T03:30:56Z)
A Survey of Text-to-SQL in the Era of LLMs: Where are we, and where are we going? [32.84561352339466]
We provide a review of Text-to- translation techniques powered by Large Language Models (LLMs)<n>We discuss the research challenges and open problems of Text-to- evaluation in the LLMs era.
arXiv Detail & Related papers (2024-08-09T14:59:36Z)
SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs) With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses. With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z)
UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems. It is composed of publicly available text-to-domain datasets and 29K databases. Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z)
CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset [40.43236560948185]
We present a large-scale CrosS-Chinese text-to-hugging dataset to carry on corresponding studies. CSS originally consisted of 4,340/ question pairs across 2 databases. In order to generalize models to different medical systems, we create 19 new databases along with 29,280 corresponding examples.
arXiv Detail & Related papers (2023-05-25T09:44:44Z)
Towards Generalizable and Robust Text-to-SQL Parsing [77.18724939989647]
We propose a novel TKK framework consisting of Task decomposition, Knowledge acquisition, and Knowledge composition to learn text-to- parsing in stages. We show that our framework is effective in all scenarios and state-of-the-art performance on the Spider, SParC, and Co. datasets.
arXiv Detail & Related papers (2022-10-23T09:21:27Z)
A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions [102.8606542189429]
The goal of text-to-corpora parsing is to convert a natural language (NL) question to its corresponding structured query language () based on the evidences provided by databases. Deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output query.
arXiv Detail & Related papers (2022-08-29T14:24:13Z)
Weakly Supervised Text-to-SQL Parsing through Question Decomposition [53.22128541030441]
We take advantage of the recently proposed question meaning representation called QDMR. Given questions, their QDMR structures (annotated by non-experts or automatically predicted) and the answers, we are able to automatically synthesizesql queries. Our results show that the weakly supervised models perform competitively with those trained on NL- benchmark data.
arXiv Detail & Related papers (2021-12-12T20:02:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.