Will Affective Computing Emerge from Foundation Models and General AI? A
First Evaluation on ChatGPT
- URL: http://arxiv.org/abs/2303.03186v1
- Date: Fri, 3 Mar 2023 16:11:37 GMT
- Title: Will Affective Computing Emerge from Foundation Models and General AI? A
First Evaluation on ChatGPT
- Authors: Mostafa M. Amin, Erik Cambria, Bj\"orn W. Schuller
- Abstract summary: ChatGPT has demonstrated competent performance across many natural language processing tasks.
We evaluate the capabilities of ChatGPT to perform text classification on three affective computing problems.
- Score: 12.456183060562317
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: ChatGPT has shown the potential of emerging general artificial intelligence
capabilities, as it has demonstrated competent performance across many natural
language processing tasks. In this work, we evaluate the capabilities of
ChatGPT to perform text classification on three affective computing problems,
namely, big-five personality prediction, sentiment analysis, and suicide
tendency detection. We utilise three baselines, a robust language model
(RoBERTa-base), a legacy word model with pretrained embeddings (Word2Vec), and
a simple bag-of-words baseline (BoW). Results show that the RoBERTa trained for
a specific downstream task generally has a superior performance. On the other
hand, ChatGPT provides decent results, and is relatively comparable to the
Word2Vec and BoW baselines. ChatGPT further shows robustness against noisy
data, where Word2Vec models achieve worse results due to noise. Results
indicate that ChatGPT is a good generalist model that is capable of achieving
good results across various problems without any specialised training, however,
it is not as good as a specialised model for a downstream task.
Related papers
- An Energy-based Model for Word-level AutoCompletion in Computer-aided Translation [97.3797716862478]
Word-level AutoCompletion (WLAC) is a rewarding yet challenging task in Computer-aided Translation.
Existing work addresses this task through a classification model based on a neural network that maps the hidden vector of the input context into its corresponding label.
This work proposes an energy-based model for WLAC, which enables the context hidden vector to capture crucial information from the source sentence.
arXiv Detail & Related papers (2024-07-29T15:07:19Z) - A Wide Evaluation of ChatGPT on Affective Computing Tasks [32.557383931586266]
We study the capabilities of the ChatGPT models, namely GPT-4 and GPT-3.5, on 13 affective computing problems.
We compare ChatGPT against more traditional NLP methods, such as end-to-end recurrent neural networks and transformers.
The results demonstrate the emergent abilities of the ChatGPT models on a wide range of affective computing problems.
arXiv Detail & Related papers (2023-08-26T16:10:30Z) - Can ChatGPT's Responses Boost Traditional Natural Language Processing? [12.456183060562317]
ChatGPT has shown the potential of emerging capabilities to solve problems, without being particularly trained to solve.
Previous work demonstrated these emerging capabilities in affective computing tasks.
We extend this by exploring if ChatGPT has novel knowledge that would enhance existing specialised models when they are fused together.
arXiv Detail & Related papers (2023-07-06T15:42:05Z) - ChatGPT-Crawler: Find out if ChatGPT really knows what it's talking
about [15.19126287569545]
This research examines the responses generated by ChatGPT from different Conversational QA corpora.
The study employed BERT similarity scores to compare these responses with correct answers and obtain Natural Language Inference(NLI) labels.
The study identified instances where ChatGPT provided incorrect answers to questions, providing insights into areas where the model may be prone to error.
arXiv Detail & Related papers (2023-04-06T18:42:47Z) - Is ChatGPT a Good NLG Evaluator? A Preliminary Study [121.77986688862302]
We provide a preliminary meta-evaluation on ChatGPT to show its reliability as an NLG metric.
Experimental results show that compared with previous automatic metrics, ChatGPT achieves state-of-the-art or competitive correlation with human judgments.
We hope our preliminary study could prompt the emergence of a general-purposed reliable NLG metric.
arXiv Detail & Related papers (2023-03-07T16:57:20Z) - ChatGPT: Jack of all trades, master of none [4.693597927153063]
OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT)
We examined ChatGPT's capabilities on 25 diverse analytical NLP tasks.
We automated ChatGPT and GPT-4 prompting process and analyzed more than 49k responses.
arXiv Detail & Related papers (2023-02-21T15:20:37Z) - Can ChatGPT Understand Too? A Comparative Study on ChatGPT and
Fine-tuned BERT [103.57103957631067]
ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries.
We evaluate ChatGPT's understanding ability by evaluating it on the most popular GLUE benchmark, and comparing it with 4 representative fine-tuned BERT-style models.
We find that: 1) ChatGPT falls short in handling paraphrase and similarity tasks; 2) ChatGPT outperforms all BERT models on inference tasks by a large margin; 3) ChatGPT achieves comparable performance compared with BERT on sentiment analysis and question answering tasks.
arXiv Detail & Related papers (2023-02-19T12:29:33Z) - Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
Large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot.
Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community.
It is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot.
arXiv Detail & Related papers (2023-02-08T09:44:51Z) - A Causal Framework to Quantify the Robustness of Mathematical Reasoning
with Language Models [81.15974174627785]
We study the behavior of language models in terms of robustness and sensitivity to direct interventions in the input space.
Our analysis shows that robustness does not appear to continuously improve as a function of size, but the GPT-3 Davinci models (175B) achieve a dramatic improvement in both robustness and sensitivity compared to all other GPT variants.
arXiv Detail & Related papers (2022-10-21T15:12:37Z) - Language Models are Few-Shot Learners [61.36677350504291]
We show that scaling up language models greatly improves task-agnostic, few-shot performance.
We train GPT-3, an autoregressive language model with 175 billion parameters, and test its performance in the few-shot setting.
GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks.
arXiv Detail & Related papers (2020-05-28T17:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.