Ar-Spider: Text-to-SQL in Arabic
- URL: http://arxiv.org/abs/2402.15012v1
- Date: Thu, 22 Feb 2024 23:11:17 GMT
- Title: Ar-Spider: Text-to-SQL in Arabic
- Authors: Saleh Almohaimeed, Saad Almohaimeed, Mansour Al Ghanim, Liqiang Wang
- Abstract summary: This paper introduces Ar-Spider 1, the first Arabic cross-language text-to-domain dataset.
Due to the unique nature of the language, two major challenges have been encountered, namely linguistic and structural challenges.
We propose the similarity relationship (CSR) approach, which results in a significant increase in the overall performance of about 1.52% for S2 and 1.06% for LGE and closes the gap between Arabic and English languages to 7.73%.
- Score: 11.463438573648297
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In Natural Language Processing (NLP), one of the most important tasks is
text-to-SQL semantic parsing, which focuses on enabling users to interact with
the database in a more natural manner. In recent years, text-to-SQL has made
significant progress, but most were English-centric. In this paper, we
introduce Ar-Spider 1, the first Arabic cross-domain text-to-SQL dataset. Due
to the unique nature of the language, two major challenges have been
encountered, namely schema linguistic and SQL structural challenges. In order
to handle these issues and conduct the experiments, we adopt two baseline
models LGESQL [4] and S2SQL [12], both of which are tested with two
cross-lingual models to alleviate the effects of schema linguistic and SQL
structure linking challenges. The baselines demonstrate decent single-language
performance on our Arabic text-to-SQL dataset, Ar-Spider, achieving 62.48% for
S2SQL and 65.57% for LGESQL, only 8.79% below the highest results achieved by
the baselines when trained in English dataset. To achieve better performance on
Arabic text-to-SQL, we propose the context similarity relationship (CSR)
approach, which results in a significant increase in the overall performance of
about 1.52% for S2SQL and 1.06% for LGESQL and closes the gap between Arabic
and English languages to 7.73%.
Related papers
- SelECT-SQL: Self-correcting ensemble Chain-of-Thought for Text-to-SQL [3.422309388045878]
We introduce SelECT-, a novel in-context learning solution that uses an algorithmic combination of chain-of-thought, self-correction, and ensemble methods.
Specifically, when configured using GPT as the base LLM, SelECT-Turbo achieves 84.2% execution accuracy on the Spider leaderboard's development set.
arXiv Detail & Related papers (2024-09-16T05:40:18Z) - SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs)
With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses.
With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z) - UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems.
It is composed of publicly available text-to-domain datasets and 29K databases.
Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z) - Can LLM Already Serve as A Database Interface? A BIg Bench for
Large-Scale Database Grounded Text-to-SQLs [89.68522473384522]
We present Bird, a big benchmark for large-scale database grounded in text-to-efficient tasks.
Our emphasis on database values highlights the new challenges of dirty database contents.
Even the most effective text-to-efficient models, i.e. ChatGPT, achieves only 40.08% in execution accuracy.
arXiv Detail & Related papers (2023-05-04T19:02:29Z) - MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic
Parsing [48.216386761482525]
We present MultiSpider, the largest multilingual text-to- schema- dataset which covers seven languages (English, German, French, Spanish, Japanese, Chinese, and Vietnamese)
Experimental results under three typical settings (zero-shot, monolingual and multilingual) reveal a 6.1% absolute drop in accuracy in non-English languages.
We also propose a simple framework augmentation framework SAVe (Augmentation-with-Verification) which boosts the overall performance by about 1.8% and closes the 29.5% performance gap across languages.
arXiv Detail & Related papers (2022-12-27T13:58:30Z) - XRICL: Cross-lingual Retrieval-Augmented In-Context Learning for
Cross-lingual Text-to-SQL Semantic Parsing [70.40401197026925]
In-context learning using large language models has recently shown surprising results for semantic parsing tasks.
This work introduces the XRICL framework, which learns to retrieve relevant English exemplars for a given query.
We also include global translation exemplars for a target language to facilitate the translation process for large language models.
arXiv Detail & Related papers (2022-10-25T01:33:49Z) - A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future
Directions [102.8606542189429]
The goal of text-to-corpora parsing is to convert a natural language (NL) question to its corresponding structured query language () based on the evidences provided by databases.
Deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output query.
arXiv Detail & Related papers (2022-08-29T14:24:13Z) - Relation Aware Semi-autoregressive Semantic Parsing for NL2SQL [17.605904256822786]
We present a Relation aware Semi-autogressive Semantic Parsing (MODN) framework, which is more adaptable for NL2 backbone.
From empirical results and case study, our model shows its effectiveness in learning better word representation in NL2.
arXiv Detail & Related papers (2021-08-02T12:21:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.