A Comprehensive Evaluation of Large Language Models on Mental Illnesses in Arabic Context
- URL: http://arxiv.org/abs/2501.06859v1
- Date: Sun, 12 Jan 2025 16:17:25 GMT
- Title: A Comprehensive Evaluation of Large Language Models on Mental Illnesses in Arabic Context
- Authors: Noureldin Zahran, Aya E. Fouda, Radwa J. Hanafy, Mohammed E. Fouda,
- Abstract summary: Mental health disorders pose a growing public health concern in the Arab world.
This study comprehensively evaluates 8 large language models (LLMs) on diverse mental health datasets.
- Score: 0.9074663948713616
- License:
- Abstract: Mental health disorders pose a growing public health concern in the Arab world, emphasizing the need for accessible diagnostic and intervention tools. Large language models (LLMs) offer a promising approach, but their application in Arabic contexts faces challenges including limited labeled datasets, linguistic complexity, and translation biases. This study comprehensively evaluates 8 LLMs, including general multi-lingual models, as well as bi-lingual ones, on diverse mental health datasets (such as AraDepSu, Dreaddit, MedMCQA), investigating the impact of prompt design, language configuration (native Arabic vs. translated English, and vice versa), and few-shot prompting on diagnostic performance. We find that prompt engineering significantly influences LLM scores mainly due to reduced instruction following, with our structured prompt outperforming a less structured variant on multi-class datasets, with an average difference of 14.5\%. While language influence on performance was modest, model selection proved crucial: Phi-3.5 MoE excelled in balanced accuracy, particularly for binary classification, while Mistral NeMo showed superior performance in mean absolute error for severity prediction tasks. Few-shot prompting consistently improved performance, with particularly substantial gains observed for GPT-4o Mini on multi-class classification, boosting accuracy by an average factor of 1.58. These findings underscore the importance of prompt optimization, multilingual analysis, and few-shot learning for developing culturally sensitive and effective LLM-based mental health tools for Arabic-speaking populations.
Related papers
- Bridging Language Barriers in Healthcare: A Study on Arabic LLMs [1.2006896500048552]
This paper investigates the challenges of developing large language models proficient in both multilingual understanding and medical knowledge.
We find that larger models with carefully calibrated language ratios achieve superior performance on native-language clinical tasks.
arXiv Detail & Related papers (2025-01-16T20:24:56Z) - SLAM: Towards Efficient Multilingual Reasoning via Selective Language Alignment [78.4550589538805]
We propose an efficient multilingual reasoning alignment approach that precisely identifies and fine-tunes the layers responsible for handling multilingualism.
Experimental results show that our method, SLAM, only tunes 6 layers' feed-forward sub-layers including 6.5-8% of all parameters within 7B and 13B LLMs.
arXiv Detail & Related papers (2025-01-07T10:29:43Z) - LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment.
We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews.
Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z) - Advancing Complex Medical Communication in Arabic with Sporo AraSum: Surpassing Existing Large Language Models [0.0]
This case study evaluates Sporo AraSum, a language model tailored for Arabic clinical documentation, against JAIS, the leading Arabic NLP model.
Results indicate that Sporo AraSum significantly outperforms JAIS in AI-centric quantitative metrics and all qualitative attributes measured in our modified version of the PDQI-9.
arXiv Detail & Related papers (2024-11-20T18:10:19Z) - Severity Prediction in Mental Health: LLM-based Creation, Analysis,
Evaluation of a Novel Multilingual Dataset [3.4146360486107987]
Large Language Models (LLMs) are increasingly integrated into various medical fields, including mental health support systems.
We present a novel multilingual adaptation of widely-used mental health datasets, translated from English into six languages.
This dataset enables a comprehensive evaluation of LLM performance in detecting mental health conditions and assessing their severity across multiple languages.
arXiv Detail & Related papers (2024-09-25T22:14:34Z) - How Does Quantization Affect Multilingual LLMs? [50.867324914368524]
Quantization techniques are widely used to improve inference speed and deployment of large language models.
We conduct a thorough analysis of quantized multilingual LLMs, focusing on performance across languages and at varying scales.
arXiv Detail & Related papers (2024-07-03T15:39:40Z) - The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment framework to bridge the gap between large language models' English and non-English performance.
Experiment results show it can boost multilingual performance across diverse reasoning scenarios, model families, and sizes.
We analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs.
arXiv Detail & Related papers (2024-05-02T14:49:50Z) - Adapting Mental Health Prediction Tasks for Cross-lingual Learning via Meta-Training and In-context Learning with Large Language Model [3.3590922002216193]
We use model-agnostic meta-learning and leveraging large language models (LLMs) to address this gap.
We first apply a meta-learning model with self-supervision, which results in improved model initialisation for rapid adaptation and cross-lingual transfer.
In parallel, we use LLMs' in-context learning capabilities to assess their performance accuracy across the Swahili mental health prediction tasks.
arXiv Detail & Related papers (2024-04-13T17:11:35Z) - Analyzing and Adapting Large Language Models for Few-Shot Multilingual
NLU: Are We There Yet? [82.02076369811402]
Supervised fine-tuning (SFT), supervised instruction tuning (SIT) and in-context learning (ICL) are three alternative, de facto standard approaches to few-shot learning.
We present an extensive and systematic comparison of the three approaches, testing them on 6 high- and low-resource languages, three different NLU tasks, and a myriad of language and domain setups.
Our observations show that supervised instruction tuning has the best trade-off between performance and resource requirements.
arXiv Detail & Related papers (2024-03-04T10:48:13Z) - Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization.
We show how to practically optimize this objective for large translation corpora using an iterated best response scheme.
Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.