A comprehensive evaluation of ChatGPT's zero-shot Text-to-SQL capability
- URL: http://arxiv.org/abs/2303.13547v1
- Date: Sun, 12 Mar 2023 04:22:01 GMT
- Title: A comprehensive evaluation of ChatGPT's zero-shot Text-to-SQL capability
- Authors: Aiwei Liu, Xuming Hu, Lijie Wen, Philip S. Yu
- Abstract summary: This paper presents the first comprehensive analysis of ChatGPT's Text-to- abilities.
We conducted experiments on 12 benchmark datasets with different languages, settings, or scenarios.
Although there is still a gap from the current state-of-the-art (SOTA) model performance, ChatGPT's performance is still impressive.
- Score: 57.71052396828714
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents the first comprehensive analysis of ChatGPT's Text-to-SQL
ability. Given the recent emergence of large-scale conversational language
model ChatGPT and its impressive capabilities in both conversational abilities
and code generation, we sought to evaluate its Text-to-SQL performance. We
conducted experiments on 12 benchmark datasets with different languages,
settings, or scenarios, and the results demonstrate that ChatGPT has strong
text-to-SQL abilities. Although there is still a gap from the current
state-of-the-art (SOTA) model performance, considering that the experiment was
conducted in a zero-shot scenario, ChatGPT's performance is still impressive.
Notably, in the ADVETA (RPL) scenario, the zero-shot ChatGPT even outperforms
the SOTA model that requires fine-tuning on the Spider dataset by 4.1\%,
demonstrating its potential for use in practical applications. To support
further research in related fields, we have made the data generated by ChatGPT
publicly available at https://github.com/THU-BPM/chatgpt-sql.
Related papers
- Battle of the Large Language Models: Dolly vs LLaMA vs Vicuna vs Guanaco
vs Bard vs ChatGPT -- A Text-to-SQL Parsing Comparison [18.092211166785397]
In recent times, a number of models have emerged, claiming performance near that of GPT-3.5 or GPT-4.
We pit six popular large language models against each other, systematically evaluating their Text-to- parsing capability on nine benchmark datasets.
Regrettably, the open-sourced models fell significantly short of the performance achieved by closed-source models like GPT-3.5, highlighting the need for further work to bridge the performance gap between these models.
arXiv Detail & Related papers (2023-10-16T08:52:41Z) - A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark
Datasets [19.521390684403293]
We present a thorough evaluation of ChatGPT's performance on diverse academic datasets.
Specifically, we evaluate ChatGPT across 140 tasks and analyze 255K responses it generates in these datasets.
arXiv Detail & Related papers (2023-05-29T12:37:21Z) - GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content [27.901155229342375]
We present a novel approach for detecting ChatGPT-generated vs. human-written text using language models.
Our models achieved remarkable results, with an accuracy of over 97% on the test dataset, as evaluated through various metrics.
arXiv Detail & Related papers (2023-05-13T17:12:11Z) - ChatLog: Carefully Evaluating the Evolution of ChatGPT Across Time [54.18651663847874]
ChatGPT has achieved great success and can be considered to have acquired an infrastructural status.
Existing benchmarks encounter two challenges: (1) Disregard for periodical evaluation and (2) Lack of fine-grained features.
We construct ChatLog, an ever-updating dataset with large-scale records of diverse long-form ChatGPT responses for 21 NLP benchmarks from March, 2023 to now.
arXiv Detail & Related papers (2023-04-27T11:33:48Z) - Evaluating ChatGPT's Information Extraction Capabilities: An Assessment
of Performance, Explainability, Calibration, and Faithfulness [18.945934162722466]
We focus on assessing the overall ability of ChatGPT using 7 fine-grained information extraction (IE) tasks.
ChatGPT's performance in Standard-IE setting is poor, but it surprisingly exhibits excellent performance in the OpenIE setting.
ChatGPT provides high-quality and trustworthy explanations for its decisions.
arXiv Detail & Related papers (2023-04-23T12:33:18Z) - ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large
Language Models in Multilingual Learning [70.57126720079971]
Large language models (LLMs) have emerged as the most important breakthroughs in natural language processing (NLP)
This paper evaluates ChatGPT on 7 different tasks, covering 37 diverse languages with high, medium, low, and extremely low resources.
Compared to the performance of previous models, our extensive experimental results demonstrate a worse performance of ChatGPT for different NLP tasks and languages.
arXiv Detail & Related papers (2023-04-12T05:08:52Z) - Is ChatGPT a Good NLG Evaluator? A Preliminary Study [121.77986688862302]
We provide a preliminary meta-evaluation on ChatGPT to show its reliability as an NLG metric.
Experimental results show that compared with previous automatic metrics, ChatGPT achieves state-of-the-art or competitive correlation with human judgments.
We hope our preliminary study could prompt the emergence of a general-purposed reliable NLG metric.
arXiv Detail & Related papers (2023-03-07T16:57:20Z) - Can ChatGPT Understand Too? A Comparative Study on ChatGPT and
Fine-tuned BERT [103.57103957631067]
ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries.
We evaluate ChatGPT's understanding ability by evaluating it on the most popular GLUE benchmark, and comparing it with 4 representative fine-tuned BERT-style models.
We find that: 1) ChatGPT falls short in handling paraphrase and similarity tasks; 2) ChatGPT outperforms all BERT models on inference tasks by a large margin; 3) ChatGPT achieves comparable performance compared with BERT on sentiment analysis and question answering tasks.
arXiv Detail & Related papers (2023-02-19T12:29:33Z) - Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
Large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot.
Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community.
It is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot.
arXiv Detail & Related papers (2023-02-08T09:44:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.