A Wide Evaluation of ChatGPT on Affective Computing Tasks
- URL: http://arxiv.org/abs/2308.13911v1
- Date: Sat, 26 Aug 2023 16:10:30 GMT
- Title: A Wide Evaluation of ChatGPT on Affective Computing Tasks
- Authors: Mostafa M. Amin, Rui Mao, Erik Cambria, Bj\"orn W. Schuller
- Abstract summary: We study the capabilities of the ChatGPT models, namely GPT-4 and GPT-3.5, on 13 affective computing problems.
We compare ChatGPT against more traditional NLP methods, such as end-to-end recurrent neural networks and transformers.
The results demonstrate the emergent abilities of the ChatGPT models on a wide range of affective computing problems.
- Score: 32.557383931586266
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rise of foundation models, a new artificial intelligence paradigm
has emerged, by simply using general purpose foundation models with prompting
to solve problems instead of training a separate machine learning model for
each problem. Such models have been shown to have emergent properties of
solving problems that they were not initially trained on. The studies for the
effectiveness of such models are still quite limited. In this work, we widely
study the capabilities of the ChatGPT models, namely GPT-4 and GPT-3.5, on 13
affective computing problems, namely aspect extraction, aspect polarity
classification, opinion extraction, sentiment analysis, sentiment intensity
ranking, emotions intensity ranking, suicide tendency detection, toxicity
detection, well-being assessment, engagement measurement, personality
assessment, sarcasm detection, and subjectivity detection. We introduce a
framework to evaluate the ChatGPT models on regression-based problems, such as
intensity ranking problems, by modelling them as pairwise ranking
classification. We compare ChatGPT against more traditional NLP methods, such
as end-to-end recurrent neural networks and transformers. The results
demonstrate the emergent abilities of the ChatGPT models on a wide range of
affective computing problems, where GPT-3.5 and especially GPT-4 have shown
strong performance on many problems, particularly the ones related to
sentiment, emotions, or toxicity. The ChatGPT models fell short for problems
with implicit signals, such as engagement measurement and subjectivity
detection.
Related papers
- On Prompt Sensitivity of ChatGPT in Affective Computing [46.93320580613236]
We introduce a method to evaluate and investigate the sensitivity of the performance of foundation models based on different prompts or generation parameters.
We perform our evaluation on ChatGPT within the scope of affective computing on three major problems, namely sentiment analysis, toxicity detection, and sarcasm detection.
arXiv Detail & Related papers (2024-03-20T22:11:01Z) - DEMASQ: Unmasking the ChatGPT Wordsmith [63.8746084667206]
We propose an effective ChatGPT detector named DEMASQ, which accurately identifies ChatGPT-generated content.
Our method addresses two critical factors: (i) the distinct biases in text composition observed in human- and machine-generated content and (ii) the alterations made by humans to evade previous detection methods.
arXiv Detail & Related papers (2023-11-08T21:13:05Z) - Can ChatGPT's Responses Boost Traditional Natural Language Processing? [12.456183060562317]
ChatGPT has shown the potential of emerging capabilities to solve problems, without being particularly trained to solve.
Previous work demonstrated these emerging capabilities in affective computing tasks.
We extend this by exploring if ChatGPT has novel knowledge that would enhance existing specialised models when they are fused together.
arXiv Detail & Related papers (2023-07-06T15:42:05Z) - Exploring the Trade-Offs: Unified Large Language Models vs Local
Fine-Tuned Models for Highly-Specific Radiology NLI Task [49.50140712943701]
We evaluate the performance of ChatGPT/GPT-4 on a radiology NLI task and compare it to other models fine-tuned specifically on task-related data samples.
We also conduct a comprehensive investigation on ChatGPT/GPT-4's reasoning ability by introducing varying levels of inference difficulty.
arXiv Detail & Related papers (2023-04-18T17:21:48Z) - ChatGPT-Crawler: Find out if ChatGPT really knows what it's talking
about [15.19126287569545]
This research examines the responses generated by ChatGPT from different Conversational QA corpora.
The study employed BERT similarity scores to compare these responses with correct answers and obtain Natural Language Inference(NLI) labels.
The study identified instances where ChatGPT provided incorrect answers to questions, providing insights into areas where the model may be prone to error.
arXiv Detail & Related papers (2023-04-06T18:42:47Z) - To ChatGPT, or not to ChatGPT: That is the question! [78.407861566006]
This study provides a comprehensive and contemporary assessment of the most recent techniques in ChatGPT detection.
We have curated a benchmark dataset consisting of prompts from ChatGPT and humans, including diverse questions from medical, open Q&A, and finance domains.
Our evaluation results demonstrate that none of the existing methods can effectively detect ChatGPT-generated content.
arXiv Detail & Related papers (2023-04-04T03:04:28Z) - Consistency Analysis of ChatGPT [65.268245109828]
This paper investigates the trustworthiness of ChatGPT and GPT-4 regarding logically consistent behaviour.
Our findings suggest that while both models appear to show an enhanced language understanding and reasoning ability, they still frequently fall short of generating logically consistent predictions.
arXiv Detail & Related papers (2023-03-11T01:19:01Z) - Will Affective Computing Emerge from Foundation Models and General AI? A
First Evaluation on ChatGPT [12.456183060562317]
ChatGPT has demonstrated competent performance across many natural language processing tasks.
We evaluate the capabilities of ChatGPT to perform text classification on three affective computing problems.
arXiv Detail & Related papers (2023-03-03T16:11:37Z) - ChatGPT: Jack of all trades, master of none [4.693597927153063]
OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT)
We examined ChatGPT's capabilities on 25 diverse analytical NLP tasks.
We automated ChatGPT and GPT-4 prompting process and analyzed more than 49k responses.
arXiv Detail & Related papers (2023-02-21T15:20:37Z) - A Causal Framework to Quantify the Robustness of Mathematical Reasoning
with Language Models [81.15974174627785]
We study the behavior of language models in terms of robustness and sensitivity to direct interventions in the input space.
Our analysis shows that robustness does not appear to continuously improve as a function of size, but the GPT-3 Davinci models (175B) achieve a dramatic improvement in both robustness and sensitivity compared to all other GPT variants.
arXiv Detail & Related papers (2022-10-21T15:12:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.