Evaluating the Effectiveness of the Foundational Models for Q&A Classification in Mental Health care
- URL: http://arxiv.org/abs/2406.15966v1
- Date: Sun, 23 Jun 2024 00:11:07 GMT
- Title: Evaluating the Effectiveness of the Foundational Models for Q&A Classification in Mental Health care
- Authors: Hassan Alhuzali, Ashwag Alasmari,
- Abstract summary: Pre-trained Language Models (PLMs) have the potential to transform mental health support.
This study evaluates the effectiveness of PLMs for classification of Questions and Answers in the domain of mental health care.
- Score: 0.18416014644193068
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Pre-trained Language Models (PLMs) have the potential to transform mental health support by providing accessible and culturally sensitive resources. However, despite this potential, their effectiveness in mental health care and specifically for the Arabic language has not been extensively explored. To bridge this gap, this study evaluates the effectiveness of foundational models for classification of Questions and Answers (Q&A) in the domain of mental health care. We leverage the MentalQA dataset, an Arabic collection featuring Q&A interactions related to mental health. In this study, we conducted experiments using four different types of learning approaches: traditional feature extraction, PLMs as feature extractors, Fine-tuning PLMs and prompting large language models (GPT-3.5 and GPT-4) in zero-shot and few-shot learning settings. While traditional feature extractors combined with Support Vector Machines (SVM) showed promising performance, PLMs exhibited even better results due to their ability to capture semantic meaning. For example, MARBERT achieved the highest performance with a Jaccard Score of 0.80 for question classification and a Jaccard Score of 0.86 for answer classification. We further conducted an in-depth analysis including examining the effects of fine-tuning versus non-fine-tuning, the impact of varying data size, and conducting error analysis. Our analysis demonstrates that fine-tuning proved to be beneficial for enhancing the performance of PLMs, and the size of the training data played a crucial role in achieving high performance. We also explored prompting, where few-shot learning with GPT-3.5 yielded promising results. There was an improvement of 12% for question and classification and 45% for answer classification. Based on our findings, it can be concluded that PLMs and prompt-based approaches hold promise for mental health support in Arabic.
Related papers
- Large Language Models for Patient Comments Multi-Label Classification [3.670008893193884]
This research explores leveraging Large Language Models (LLMs) in conducting Multi-label Text Classification (MLTC) of inpatient comments.
GPT-4 Turbo was leveraged to conduct the classification.
Using the prompt engineering framework, zero-shot learning, in-context learning, and chain-of-thought prompting were experimented with.
arXiv Detail & Related papers (2024-10-31T00:29:52Z) - Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction [54.23208041792073]
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review.
A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods.
We propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels.
arXiv Detail & Related papers (2024-06-26T05:30:21Z) - Comparing the Efficacy of GPT-4 and Chat-GPT in Mental Health Care: A Blind Assessment of Large Language Models for Psychological Support [0.0]
Two large language models, GPT-4 and Chat-GPT, were tested in responding to a set of 18 psychological prompts.
GPT-4 achieved an average rating of 8.29 out of 10, while Chat-GPT received an average rating of 6.52.
arXiv Detail & Related papers (2024-05-15T12:44:54Z) - Adapting Mental Health Prediction Tasks for Cross-lingual Learning via Meta-Training and In-context Learning with Large Language Model [3.3590922002216193]
We use model-agnostic meta-learning and leveraging large language models (LLMs) to address this gap.
We first apply a meta-learning model with self-supervision, which results in improved model initialisation for rapid adaptation and cross-lingual transfer.
In parallel, we use LLMs' in-context learning capabilities to assess their performance accuracy across the Swahili mental health prediction tasks.
arXiv Detail & Related papers (2024-04-13T17:11:35Z) - F-Eval: Assessing Fundamental Abilities with Refined Evaluation Methods [102.98899881389211]
We propose F-Eval, a bilingual evaluation benchmark to evaluate the fundamental abilities, including expression, commonsense and logic.
For reference-free subjective tasks, we devise new evaluation methods, serving as alternatives to scoring by API models.
arXiv Detail & Related papers (2024-01-26T13:55:32Z) - LLaMA Beyond English: An Empirical Study on Language Capability Transfer [49.298360366468934]
We focus on how to effectively transfer the capabilities of language generation and following instructions to a non-English language.
We analyze the impact of key factors such as vocabulary extension, further pretraining, and instruction tuning on transfer.
We employ four widely used standardized testing benchmarks: C-Eval, MMLU, AGI-Eval, and GAOKAO-Bench.
arXiv Detail & Related papers (2024-01-02T06:29:02Z) - Mental-LLM: Leveraging Large Language Models for Mental Health
Prediction via Online Text Data [42.965788205842465]
We present a comprehensive evaluation of multiple large language models (LLMs) on various mental health prediction tasks.
We conduct experiments covering zero-shot prompting, few-shot prompting, and instruction fine-tuning.
Our best-finetuned models, Mental-Alpaca and Mental-FLAN-T5, outperform the best prompt design of GPT-3.5 by 10.9% on balanced accuracy and the best of GPT-4 (250 and 150 times bigger) by 4.8%.
arXiv Detail & Related papers (2023-07-26T06:00:50Z) - Evaluating the Performance of Large Language Models on GAOKAO Benchmark [53.663757126289795]
This paper introduces GAOKAO-Bench, an intuitive benchmark that employs questions from the Chinese GAOKAO examination as test samples.
With human evaluation, we obtain the converted total score of LLMs, including GPT-4, ChatGPT and ERNIE-Bot.
We also use LLMs to grade the subjective questions, and find that model scores achieve a moderate level of consistency with human scores.
arXiv Detail & Related papers (2023-05-21T14:39:28Z) - Towards Interpretable Mental Health Analysis with Large Language Models [27.776003210275608]
We evaluate the mental health analysis and emotional reasoning ability of large language models (LLMs) on 11 datasets across 5 tasks.
Based on prompts, we explore LLMs for interpretable mental health analysis by instructing them to generate explanations for each of their decisions.
We convey strict human evaluations to assess the quality of the generated explanations, leading to a novel dataset with 163 human-assessed explanations.
arXiv Detail & Related papers (2023-04-06T19:53:59Z) - Estimating and Improving Fairness with Adversarial Learning [65.99330614802388]
We propose an adversarial multi-task training strategy to simultaneously mitigate and detect bias in the deep learning-based medical image analysis system.
Specifically, we propose to add a discrimination module against bias and a critical module that predicts unfairness within the base classification model.
We evaluate our framework on a large-scale public-available skin lesion dataset.
arXiv Detail & Related papers (2021-03-07T03:10:32Z) - MET: Multimodal Perception of Engagement for Telehealth [52.54282887530756]
We present MET, a learning-based algorithm for perceiving a human's level of engagement from videos.
We release a new dataset, MEDICA, for mental health patient engagement detection.
arXiv Detail & Related papers (2020-11-17T15:18:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.