ChatGPT for Suicide Risk Assessment on Social Media: Quantitative
Evaluation of Model Performance, Potentials and Limitations
- URL: http://arxiv.org/abs/2306.09390v1
- Date: Thu, 15 Jun 2023 16:01:30 GMT
- Title: ChatGPT for Suicide Risk Assessment on Social Media: Quantitative
Evaluation of Model Performance, Potentials and Limitations
- Authors: Hamideh Ghanadian, Isar Nejadgholi, Hussein Al Osman
- Abstract summary: This paper presents a framework for evaluating the interactive ChatGPT model in the context of suicidality assessment from social media posts.
We conduct a technical evaluation of ChatGPT's performance on this task using Zero-Shot and Few-Shot experiments.
Our results indicate that while ChatGPT attains considerable accuracy in this task, transformer-based models fine-tuned on human-annotated datasets exhibit superior performance.
- Score: 5.8762433393846045
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a novel framework for quantitatively evaluating the
interactive ChatGPT model in the context of suicidality assessment from social
media posts, utilizing the University of Maryland Reddit suicidality dataset.
We conduct a technical evaluation of ChatGPT's performance on this task using
Zero-Shot and Few-Shot experiments and compare its results with those of two
fine-tuned transformer-based models. Additionally, we investigate the impact of
different temperature parameters on ChatGPT's response generation and discuss
the optimal temperature based on the inconclusiveness rate of ChatGPT. Our
results indicate that while ChatGPT attains considerable accuracy in this task,
transformer-based models fine-tuned on human-annotated datasets exhibit
superior performance. Moreover, our analysis sheds light on how adjusting the
ChatGPT's hyperparameters can improve its ability to assist mental health
professionals in this critical task.
Related papers
- Using ChatGPT to Score Essays and Short-Form Constructed Responses [0.0]
Investigation focused on various prediction models, including linear regression, random forest, gradient boost, and boost.
ChatGPT's performance was evaluated against human raters using quadratic weighted kappa (QWK) metrics.
Study concludes that ChatGPT can complement human scoring but requires additional development to be reliable for high-stakes assessments.
arXiv Detail & Related papers (2024-08-18T16:51:28Z) - A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark
Datasets [19.521390684403293]
We present a thorough evaluation of ChatGPT's performance on diverse academic datasets.
Specifically, we evaluate ChatGPT across 140 tasks and analyze 255K responses it generates in these datasets.
arXiv Detail & Related papers (2023-05-29T12:37:21Z) - "HOT" ChatGPT: The promise of ChatGPT in detecting and discriminating
hateful, offensive, and toxic comments on social media [2.105577305992576]
Generative AI models have the potential to understand and detect harmful content.
ChatGPT can achieve an accuracy of approximately 80% when compared to human annotations.
arXiv Detail & Related papers (2023-04-20T19:40:51Z) - To ChatGPT, or not to ChatGPT: That is the question! [78.407861566006]
This study provides a comprehensive and contemporary assessment of the most recent techniques in ChatGPT detection.
We have curated a benchmark dataset consisting of prompts from ChatGPT and humans, including diverse questions from medical, open Q&A, and finance domains.
Our evaluation results demonstrate that none of the existing methods can effectively detect ChatGPT-generated content.
arXiv Detail & Related papers (2023-04-04T03:04:28Z) - Can we trust the evaluation on ChatGPT? [7.036744911062111]
ChatGPT, the first large language model (LLM) with mass adoption, has demonstrated remarkable performance in numerous natural language tasks.
evaluating ChatGPT's performance in diverse problem domains remains challenging due to the closed nature of the model.
We highlight the issue of data contamination in ChatGPT evaluations, with a case study of the task of stance detection.
arXiv Detail & Related papers (2023-03-22T17:32:56Z) - A comprehensive evaluation of ChatGPT's zero-shot Text-to-SQL capability [57.71052396828714]
This paper presents the first comprehensive analysis of ChatGPT's Text-to- abilities.
We conducted experiments on 12 benchmark datasets with different languages, settings, or scenarios.
Although there is still a gap from the current state-of-the-art (SOTA) model performance, ChatGPT's performance is still impressive.
arXiv Detail & Related papers (2023-03-12T04:22:01Z) - Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining.
We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data.
Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z) - Is ChatGPT a Good NLG Evaluator? A Preliminary Study [121.77986688862302]
We provide a preliminary meta-evaluation on ChatGPT to show its reliability as an NLG metric.
Experimental results show that compared with previous automatic metrics, ChatGPT achieves state-of-the-art or competitive correlation with human judgments.
We hope our preliminary study could prompt the emergence of a general-purposed reliable NLG metric.
arXiv Detail & Related papers (2023-03-07T16:57:20Z) - Can ChatGPT Understand Too? A Comparative Study on ChatGPT and
Fine-tuned BERT [103.57103957631067]
ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries.
We evaluate ChatGPT's understanding ability by evaluating it on the most popular GLUE benchmark, and comparing it with 4 representative fine-tuned BERT-style models.
We find that: 1) ChatGPT falls short in handling paraphrase and similarity tasks; 2) ChatGPT outperforms all BERT models on inference tasks by a large margin; 3) ChatGPT achieves comparable performance compared with BERT on sentiment analysis and question answering tasks.
arXiv Detail & Related papers (2023-02-19T12:29:33Z) - Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
Large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot.
Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community.
It is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot.
arXiv Detail & Related papers (2023-02-08T09:44:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.