Related papers: Prompt Engineering Techniques for Context-dependent Text-to-SQL in Arabic

Prompt Engineering Techniques for Context-dependent Text-to-SQL in Arabic

URL: http://arxiv.org/abs/2511.20677v1
Date: Sun, 16 Nov 2025 00:05:40 GMT
Title: Prompt Engineering Techniques for Context-dependent Text-to-SQL in Arabic
Authors: Saleh Almohaimeed, May Alsofyani, Saad Almohaimeed, Mansour Al Ghanim, Liqiang Wang,
Abstract summary: We introduce Ar-SParC, the first Arabic cross-domain, context-dependent text-to-context dataset.<n>The dataset consists of 3,450 sequences of interrelated questions, each sequence containing an average of approximately three questions.<n>We conducted 40 experiments on the Ar-SParC dataset using two large language models, GPT-3.5-turbo and GPT-4.5-turbo.
Score: 2.8855202197281007
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In recent years, the task of cross-domain, context-dependent text-to-SQL has received significant attention. Enables users with no prior knowledge of SQL to have a conversation with databases using natural language. However, most of the available datasets and research have been conducted in English, along with some work in Chinese. To this date, no effort has been made to address this task in the Arabic language. In this paper, we introduce Ar-SParC, the first Arabic cross-domain, context-dependent text-to-SQL dataset. The dataset consists of 3,450 sequences of interrelated questions, each sequence containing an average of approximately three questions, which results in a total of 10225 questions along with their corresponding SQL queries. We conducted 40 experiments on the Ar-SParC dataset using two large language models, GPT-3.5-turbo and GPT-4.5-turbo, applying 10 different prompt engineering techniques, including four question representation methods and six in-context learning techniques. Furthermore, we developed a novel approach named GAT corrector, which enhanced the performance across all 40 experiments, yielding an average improvement of 1.9% in execution accuracy (EX) and 1.9% in interaction accuracy (IX) under zero-shot settings, and an average increase of 1.72% EX and 0.92% IX under in-context learning settings. Finally, we conducted an ablation study with two more experiments to explain why the GAT corrector outperformed the previous GAT verifier technique, particularly for the Arabic language.

Related papers

Text-to-SQL Oriented to the Process Mining Domain: A PT-EN Dataset for Query Translation [0.10499611180329804]
This paper introduces text-2--4-PM, a benchmark dataset for the text-to-four task in the process mining domain.<n>The dataset comprises 1,655 natural language utterances, including human-generated paraphrases, 205sql statements, and ten qualifiers.<n>The results show that text-2--4-PM supports evaluation of text-to-four implementations, offering broader applicability for semantic parsing and other natural language processing tasks.
arXiv Detail & Related papers (2025-08-18T01:25:41Z)
SelECT-SQL: Self-correcting ensemble Chain-of-Thought for Text-to-SQL [3.422309388045878]
We introduce SelECT-, a novel in-context learning solution that uses an algorithmic combination of chain-of-thought, self-correction, and ensemble methods. Specifically, when configured using GPT as the base LLM, SelECT-Turbo achieves 84.2% execution accuracy on the Spider leaderboard's development set.
arXiv Detail & Related papers (2024-09-16T05:40:18Z)
Ar-Spider: Text-to-SQL in Arabic [11.463438573648297]
This paper introduces Ar-Spider 1, the first Arabic cross-language text-to-domain dataset. Due to the unique nature of the language, two major challenges have been encountered, namely linguistic and structural challenges. We propose the similarity relationship (CSR) approach, which results in a significant increase in the overall performance of about 1.52% for S2 and 1.06% for LGE and closes the gap between Arabic and English languages to 7.73%.
arXiv Detail & Related papers (2024-02-22T23:11:17Z)
Archer: A Human-Labeled Text-to-SQL Dataset with Arithmetic, Commonsense and Hypothetical Reasoning [67.7258569181669]
This dataset demonstrates a significantly higher level of complexity compared to existing publicly available datasets. Archer challenges the capabilities of current state-of-the-art models, with a high-ranked model on the Spider leaderboard achieving only 6.73% execution accuracy on Archer test set.
arXiv Detail & Related papers (2024-02-19T21:24:36Z)
A Multilingual Translator to SQL with Database Schema Pruning to Improve Self-Attention [0.0]
We present techniques that allow long text sequences to be handled by transformers with up to 512 input tokens. In addition, we used a multilingual approach with the mT5-large model fine-tuned with a data-augmented Spider dataset in four languages simultaneously.
arXiv Detail & Related papers (2023-06-25T14:28:12Z)
SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs) With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses. With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z)
UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems. It is composed of publicly available text-to-domain datasets and 29K databases. Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z)
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs [89.68522473384522]
We present Bird, a big benchmark for large-scale database grounded in text-to-efficient tasks. Our emphasis on database values highlights the new challenges of dirty database contents. Even the most effective text-to-efficient models, i.e. ChatGPT, achieves only 40.08% in execution accuracy.
arXiv Detail & Related papers (2023-05-04T19:02:29Z)
MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing [48.216386761482525]
We present MultiSpider, the largest multilingual text-to- schema- dataset which covers seven languages (English, German, French, Spanish, Japanese, Chinese, and Vietnamese) Experimental results under three typical settings (zero-shot, monolingual and multilingual) reveal a 6.1% absolute drop in accuracy in non-English languages. We also propose a simple framework augmentation framework SAVe (Augmentation-with-Verification) which boosts the overall performance by about 1.8% and closes the 29.5% performance gap across languages.
arXiv Detail & Related papers (2022-12-27T13:58:30Z)
"What Do You Mean by That?" A Parser-Independent Interactive Approach for Enhancing Text-to-SQL [49.85635994436742]
We include human in the loop and present a novel-independent interactive approach (PIIA) that interacts with users using multi-choice questions. PIIA is capable of enhancing the text-to-domain performance with limited interaction turns by using both simulation and human evaluation.
arXiv Detail & Related papers (2020-11-09T02:14:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.