Can ChatGPT Reproduce Human-Generated Labels? A Study of Social
Computing Tasks
- URL: http://arxiv.org/abs/2304.10145v2
- Date: Sat, 22 Apr 2023 08:55:33 GMT
- Title: Can ChatGPT Reproduce Human-Generated Labels? A Study of Social
Computing Tasks
- Authors: Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, Gareth Tyson
- Abstract summary: ChatGPT has the potential to reproduce human-generated label annotations in social computing tasks.
We relabel five datasets covering stance detection (2x), sentiment analysis, hate speech, and bot detection.
Our results highlight that ChatGPT does have the potential to handle these data annotation tasks, although a number of challenges remain.
- Score: 9.740764281808588
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The release of ChatGPT has uncovered a range of possibilities whereby large
language models (LLMs) can substitute human intelligence. In this paper, we
seek to understand whether ChatGPT has the potential to reproduce
human-generated label annotations in social computing tasks. Such an
achievement could significantly reduce the cost and complexity of social
computing research. As such, we use ChatGPT to relabel five seminal datasets
covering stance detection (2x), sentiment analysis, hate speech, and bot
detection. Our results highlight that ChatGPT does have the potential to handle
these data annotation tasks, although a number of challenges remain. ChatGPT
obtains an average accuracy 0.609. Performance is highest for the sentiment
analysis dataset, with ChatGPT correctly annotating 64.9% of tweets. Yet, we
show that performance varies substantially across individual labels. We believe
this work can open up new lines of analysis and act as a basis for future
research into the exploitation of ChatGPT for human annotation tasks.
Related papers
- Exploring the Capability of ChatGPT to Reproduce Human Labels for Social Computing Tasks (Extended Version) [26.643834593780007]
We investigate the extent to which ChatGPT can annotate data for social computing tasks.
ChatGPT exhibits promise in handling data annotation tasks, albeit with some challenges.
We propose GPT-Rater, a tool to predict if ChatGPT can correctly label data for a given annotation task.
arXiv Detail & Related papers (2024-07-08T22:04:30Z) - Exploring ChatGPT's Capabilities on Vulnerability Management [56.4403395100589]
We explore ChatGPT's capabilities on 6 tasks involving the complete vulnerability management process with a large-scale dataset containing 70,346 samples.
One notable example is ChatGPT's proficiency in tasks like generating titles for software bug reports.
Our findings reveal the difficulties encountered by ChatGPT and shed light on promising future directions.
arXiv Detail & Related papers (2023-11-11T11:01:13Z) - Primacy Effect of ChatGPT [69.49920102917598]
We study the primacy effect of ChatGPT: the tendency of selecting the labels at earlier positions as the answer.
We hope that our experiments and analyses provide additional insights into building more reliable ChatGPT-based solutions.
arXiv Detail & Related papers (2023-10-20T00:37:28Z) - Leveraging ChatGPT As Text Annotation Tool For Sentiment Analysis [6.596002578395151]
ChatGPT is a new product of OpenAI and has emerged as the most popular AI product.
This study explores the use of ChatGPT as a tool for data labeling for different sentiment analysis tasks.
arXiv Detail & Related papers (2023-06-18T12:20:42Z) - ChatLog: Carefully Evaluating the Evolution of ChatGPT Across Time [54.18651663847874]
ChatGPT has achieved great success and can be considered to have acquired an infrastructural status.
Existing benchmarks encounter two challenges: (1) Disregard for periodical evaluation and (2) Lack of fine-grained features.
We construct ChatLog, an ever-updating dataset with large-scale records of diverse long-form ChatGPT responses for 21 NLP benchmarks from March, 2023 to now.
arXiv Detail & Related papers (2023-04-27T11:33:48Z) - To ChatGPT, or not to ChatGPT: That is the question! [78.407861566006]
This study provides a comprehensive and contemporary assessment of the most recent techniques in ChatGPT detection.
We have curated a benchmark dataset consisting of prompts from ChatGPT and humans, including diverse questions from medical, open Q&A, and finance domains.
Our evaluation results demonstrate that none of the existing methods can effectively detect ChatGPT-generated content.
arXiv Detail & Related papers (2023-04-04T03:04:28Z) - ChatGPT: Jack of all trades, master of none [4.693597927153063]
OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT)
We examined ChatGPT's capabilities on 25 diverse analytical NLP tasks.
We automated ChatGPT and GPT-4 prompting process and analyzed more than 49k responses.
arXiv Detail & Related papers (2023-02-21T15:20:37Z) - Can ChatGPT Understand Too? A Comparative Study on ChatGPT and
Fine-tuned BERT [103.57103957631067]
ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries.
We evaluate ChatGPT's understanding ability by evaluating it on the most popular GLUE benchmark, and comparing it with 4 representative fine-tuned BERT-style models.
We find that: 1) ChatGPT falls short in handling paraphrase and similarity tasks; 2) ChatGPT outperforms all BERT models on inference tasks by a large margin; 3) ChatGPT achieves comparable performance compared with BERT on sentiment analysis and question answering tasks.
arXiv Detail & Related papers (2023-02-19T12:29:33Z) - Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
Large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot.
Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community.
It is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot.
arXiv Detail & Related papers (2023-02-08T09:44:51Z) - How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation,
and Detection [8.107721810172112]
ChatGPT is able to respond effectively to a wide range of human questions.
People are starting to worry about the potential negative impacts that large language models (LLMs) like ChatGPT could have on society.
In this work, we collected tens of thousands of comparison responses from both human experts and ChatGPT.
arXiv Detail & Related papers (2023-01-18T15:23:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.