Mental-LLM: Leveraging Large Language Models for Mental Health
Prediction via Online Text Data
- URL: http://arxiv.org/abs/2307.14385v4
- Date: Sun, 28 Jan 2024 16:54:03 GMT
- Title: Mental-LLM: Leveraging Large Language Models for Mental Health
Prediction via Online Text Data
- Authors: Xuhai Xu, Bingsheng Yao, Yuanzhe Dong, Saadia Gabriel, Hong Yu, James
Hendler, Marzyeh Ghassemi, Anind K. Dey, Dakuo Wang
- Abstract summary: We present a comprehensive evaluation of multiple large language models (LLMs) on various mental health prediction tasks.
We conduct experiments covering zero-shot prompting, few-shot prompting, and instruction fine-tuning.
Our best-finetuned models, Mental-Alpaca and Mental-FLAN-T5, outperform the best prompt design of GPT-3.5 by 10.9% on balanced accuracy and the best of GPT-4 (250 and 150 times bigger) by 4.8%.
- Score: 42.965788205842465
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Advances in large language models (LLMs) have empowered a variety of
applications. However, there is still a significant gap in research when it
comes to understanding and enhancing the capabilities of LLMs in the field of
mental health. In this work, we present a comprehensive evaluation of multiple
LLMs on various mental health prediction tasks via online text data, including
Alpaca, Alpaca-LoRA, FLAN-T5, GPT-3.5, and GPT-4. We conduct a broad range of
experiments, covering zero-shot prompting, few-shot prompting, and instruction
fine-tuning. The results indicate a promising yet limited performance of LLMs
with zero-shot and few-shot prompt designs for mental health tasks. More
importantly, our experiments show that instruction finetuning can significantly
boost the performance of LLMs for all tasks simultaneously. Our best-finetuned
models, Mental-Alpaca and Mental-FLAN-T5, outperform the best prompt design of
GPT-3.5 (25 and 15 times bigger) by 10.9% on balanced accuracy and the best of
GPT-4 (250 and 150 times bigger) by 4.8%. They further perform on par with the
state-of-the-art task-specific language model. We also conduct an exploratory
case study on LLMs' capability on mental health reasoning tasks, illustrating
the promising capability of certain models such as GPT-4. We summarize our
findings into a set of action guidelines for potential methods to enhance LLMs'
capability for mental health tasks. Meanwhile, we also emphasize the important
limitations before achieving deployability in real-world mental health
settings, such as known racial and gender bias. We highlight the important
ethical risks accompanying this line of research.
Related papers
- Is your LLM trapped in a Mental Set? Investigative study on how mental sets affect the reasoning capabilities of LLMs [8.920202114368843]
We present an investigative study on how Mental Sets influence the reasoning capabilities of LLMs.
Mental Sets refers to the tendency to persist with previously successful strategies, even when they become inefficient.
We compare the performance of LLM models like Llama-3.1-8B-Instruct, Llama-3.1-70B-Instruct and GPT-4o in the presence of mental sets.
arXiv Detail & Related papers (2025-01-21T02:29:15Z) - LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment.
We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews.
Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z) - MentalGLM Series: Explainable Large Language Models for Mental Health Analysis on Chinese Social Media [31.752563319585196]
Black box models are inflexible when switching between tasks, and their results typically lack explanations.
With the rise of large language models (LLMs), their flexibility has introduced new approaches to the field.
In this paper, we introduce the first multi-task Chinese Social Media Interpretable Mental Health Instructions dataset, consisting of 9K samples.
We also propose MentalGLM series models, the first open-source LLMs designed for explainable mental health analysis targeting Chinese social media.
arXiv Detail & Related papers (2024-10-14T09:29:27Z) - GIVE: Structured Reasoning of Large Language Models with Knowledge Graph Inspired Veracity Extrapolation [108.2008975785364]
Graph Inspired Veracity Extrapolation (GIVE) is a novel reasoning method that merges parametric and non-parametric memories to improve accurate reasoning with minimal external input.
GIVE guides the LLM agent to select the most pertinent expert data (observe), engage in query-specific divergent thinking (reflect), and then synthesize this information to produce the final output (speak)
arXiv Detail & Related papers (2024-10-11T03:05:06Z) - Towards Evaluating and Building Versatile Large Language Models for Medicine [57.49547766838095]
We present MedS-Bench, a benchmark designed to evaluate the performance of large language models (LLMs) in clinical contexts.
MedS-Bench spans 11 high-level clinical tasks, including clinical report summarization, treatment recommendations, diagnosis, named entity recognition, and medical concept explanation.
MedS-Ins comprises 58 medically oriented language corpora, totaling 13.5 million samples across 122 tasks.
arXiv Detail & Related papers (2024-08-22T17:01:34Z) - Evaluating the Effectiveness of the Foundational Models for Q&A Classification in Mental Health care [0.18416014644193068]
Pre-trained Language Models (PLMs) have the potential to transform mental health support.
This study evaluates the effectiveness of PLMs for classification of Questions and Answers in the domain of mental health care.
arXiv Detail & Related papers (2024-06-23T00:11:07Z) - A Novel Nuanced Conversation Evaluation Framework for Large Language Models in Mental Health [42.711913023646915]
We propose a novel framework for evaluating the nuanced conversation abilities of Large Language Models (LLMs)
Within it, we develop a series of quantitative metrics developed from literature on using psychotherapy conversation analysis literature.
We use our framework to evaluate several popular frontier LLMs, including some GPT and Llama models, through a verified mental health dataset.
arXiv Detail & Related papers (2024-03-08T23:46:37Z) - Benchmarking Hallucination in Large Language Models based on
Unanswerable Math Word Problem [58.3723958800254]
Large language models (LLMs) are highly effective in various natural language processing (NLP) tasks.
They are susceptible to producing unreliable conjectures in ambiguous contexts called hallucination.
This paper presents a new method for evaluating LLM hallucination in Question Answering (QA) based on the unanswerable math word problem (MWP)
arXiv Detail & Related papers (2024-03-06T09:06:34Z) - Revisiting the Reliability of Psychological Scales on Large Language Models [62.57981196992073]
This study aims to determine the reliability of applying personality assessments to Large Language Models.
Analysis of 2,500 settings per model, including GPT-3.5, GPT-4, Gemini-Pro, and LLaMA-3.1, reveals that various LLMs show consistency in responses to the Big Five Inventory.
arXiv Detail & Related papers (2023-05-31T15:03:28Z) - Towards Interpretable Mental Health Analysis with Large Language Models [27.776003210275608]
We evaluate the mental health analysis and emotional reasoning ability of large language models (LLMs) on 11 datasets across 5 tasks.
Based on prompts, we explore LLMs for interpretable mental health analysis by instructing them to generate explanations for each of their decisions.
We convey strict human evaluations to assess the quality of the generated explanations, leading to a novel dataset with 163 human-assessed explanations.
arXiv Detail & Related papers (2023-04-06T19:53:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.