ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks
- URL: http://arxiv.org/abs/2303.15056v2
- Date: Wed, 19 Jul 2023 14:10:55 GMT
- Title: ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks
- Authors: Fabrizio Gilardi, Meysam Alizadeh, Ma\"el Kubli
- Abstract summary: We show that ChatGPT outperforms crowd-workers for several annotation tasks.
The per-annotation cost of ChatGPT is less than $0.003 -- about twenty times cheaper than MTurk.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many NLP applications require manual data annotations for a variety of tasks,
notably to train classifiers or evaluate the performance of unsupervised
models. Depending on the size and degree of complexity, the tasks may be
conducted by crowd-workers on platforms such as MTurk as well as trained
annotators, such as research assistants. Using a sample of 2,382 tweets, we
demonstrate that ChatGPT outperforms crowd-workers for several annotation
tasks, including relevance, stance, topics, and frames detection. Specifically,
the zero-shot accuracy of ChatGPT exceeds that of crowd-workers for four out of
five tasks, while ChatGPT's intercoder agreement exceeds that of both
crowd-workers and trained annotators for all tasks. Moreover, the
per-annotation cost of ChatGPT is less than $0.003 -- about twenty times
cheaper than MTurk. These results show the potential of large language models
to drastically increase the efficiency of text classification.
Related papers
- Exploring ChatGPT's Capabilities on Vulnerability Management [56.4403395100589]
We explore ChatGPT's capabilities on 6 tasks involving the complete vulnerability management process with a large-scale dataset containing 70,346 samples.
One notable example is ChatGPT's proficiency in tasks like generating titles for software bug reports.
Our findings reveal the difficulties encountered by ChatGPT and shed light on promising future directions.
arXiv Detail & Related papers (2023-11-11T11:01:13Z) - Chatbots Are Not Reliable Text Annotators [0.0]
ChatGPT is a closed-source product which has major drawbacks with regards to transparency, cost, and data protection.
Recent advances in open-source (OS) large language models (LLMs) offer alternatives which remedy these challenges.
arXiv Detail & Related papers (2023-11-09T22:28:14Z) - Pushing the Limits of ChatGPT on NLP Tasks [79.17291002710517]
Despite the success of ChatGPT, its performances on most NLP tasks are still well below the supervised baselines.
In this work, we looked into the causes, and discovered that its subpar performance was caused by the following factors.
We propose a collection of general modules to address these issues, in an attempt to push the limits of ChatGPT on NLP tasks.
arXiv Detail & Related papers (2023-06-16T09:40:05Z) - A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark
Datasets [19.521390684403293]
We present a thorough evaluation of ChatGPT's performance on diverse academic datasets.
Specifically, we evaluate ChatGPT across 140 tasks and analyze 255K responses it generates in these datasets.
arXiv Detail & Related papers (2023-05-29T12:37:21Z) - ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large
Language Models in Multilingual Learning [70.57126720079971]
Large language models (LLMs) have emerged as the most important breakthroughs in natural language processing (NLP)
This paper evaluates ChatGPT on 7 different tasks, covering 37 diverse languages with high, medium, low, and extremely low resources.
Compared to the performance of previous models, our extensive experimental results demonstrate a worse performance of ChatGPT for different NLP tasks and languages.
arXiv Detail & Related papers (2023-04-12T05:08:52Z) - Exploring the Feasibility of ChatGPT for Event Extraction [31.175880361951172]
Event extraction is a fundamental task in natural language processing that involves identifying and extracting information about events mentioned in text.
ChatGPT provides an opportunity to solve language tasks with simple prompts without the need for task-specific datasets and fine-tuning.
We show that ChatGPT has, on average, only 51.04% of the performance of a task-specific model such as EEQA in long-tail and complex scenarios.
arXiv Detail & Related papers (2023-03-07T12:03:58Z) - Can ChatGPT Understand Too? A Comparative Study on ChatGPT and
Fine-tuned BERT [103.57103957631067]
ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries.
We evaluate ChatGPT's understanding ability by evaluating it on the most popular GLUE benchmark, and comparing it with 4 representative fine-tuned BERT-style models.
We find that: 1) ChatGPT falls short in handling paraphrase and similarity tasks; 2) ChatGPT outperforms all BERT models on inference tasks by a large margin; 3) ChatGPT achieves comparable performance compared with BERT on sentiment analysis and question answering tasks.
arXiv Detail & Related papers (2023-02-19T12:29:33Z) - A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on
Reasoning, Hallucination, and Interactivity [79.12003701981092]
We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks.
We evaluate the multitask, multilingual and multi-modal aspects of ChatGPT based on these data sets and a newly designed multimodal dataset.
ChatGPT is 63.41% accurate on average in 10 different reasoning categories under logical reasoning, non-textual reasoning, and commonsense reasoning.
arXiv Detail & Related papers (2023-02-08T12:35:34Z) - Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
Large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot.
Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community.
It is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot.
arXiv Detail & Related papers (2023-02-08T09:44:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.