Can Large Language Models Identify Implicit Suicidal Ideation? An Empirical Evaluation
- URL: http://arxiv.org/abs/2502.17899v1
- Date: Tue, 25 Feb 2025 06:53:00 GMT
- Title: Can Large Language Models Identify Implicit Suicidal Ideation? An Empirical Evaluation
- Authors: Tong Li, Shu Yang, Junchao Wu, Jiyao Wei, Lijie Hu, Mengdi Li, Derek F. Wong, Joshua R. Oltmanns, Di Wang,
- Abstract summary: We introduce ourdata, a novel dataset of 1,308 test cases built upon psychological frameworks.<n>We find that current models struggle significantly with detecting implicit suicidal ideation and providing appropriate support.
- Score: 26.039402946157782
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a comprehensive evaluation framework for assessing Large Language Models' (LLMs) capabilities in suicide prevention, focusing on two critical aspects: the Identification of Implicit Suicidal ideation (IIS) and the Provision of Appropriate Supportive responses (PAS). We introduce \ourdata, a novel dataset of 1,308 test cases built upon psychological frameworks including D/S-IAT and Negative Automatic Thinking, alongside real-world scenarios. Through extensive experiments with 8 widely used LLMs under different contextual settings, we find that current models struggle significantly with detecting implicit suicidal ideation and providing appropriate support, highlighting crucial limitations in applying LLMs to mental health contexts. Our findings underscore the need for more sophisticated approaches in developing and evaluating LLMs for sensitive psychological applications.
Related papers
- Large Language Models for Outpatient Referral: Problem Definition, Benchmarking and Challenges [34.10494503049667]
Large language models (LLMs) are increasingly applied to outpatient referral tasks across healthcare systems.
There is a lack of standardized evaluation criteria to assess their effectiveness.
We propose a comprehensive evaluation framework specifically designed for such systems.
arXiv Detail & Related papers (2025-03-11T11:05:42Z) - Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions.
We propose a novel approach utilizing structured medical reasoning.
Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z) - LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment.<n>We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews.<n>Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z) - SouLLMate: An Adaptive LLM-Driven System for Advanced Mental Health Support and Assessment, Based on a Systematic Application Survey [9.146311285410631]
Mental health issues significantly impact individuals' daily lives, yet many do not receive the help they need even with available online resources.
This study aims to provide accessible, stigma-free, personalized, and real-time mental health support through cutting-edge AI technologies.
arXiv Detail & Related papers (2024-10-06T17:11:29Z) - Attention Heads of Large Language Models: A Survey [10.136767972375639]
We aim to demystify the internal reasoning processes of Large Language Models (LLMs) by systematically exploring the roles and mechanisms of attention heads.<n>We first introduce a novel four-stage framework inspired by the human thought process: Knowledge Recalling, In-Context Identification, Latent Reasoning, and Expression Preparation.<n>We analyze the experimental methodologies used to discover these special heads, dividing them into two categories: Modeling-Free and Modeling-Required methods.
arXiv Detail & Related papers (2024-09-05T17:59:12Z) - Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective [66.34066553400108]
We conduct a rigorous evaluation of large language models' implicit bias towards certain demographics.
Inspired by psychometric principles, we propose three attack approaches, i.e., Disguise, Deception, and Teaching.
Our methods can elicit LLMs' inner bias more effectively than competitive baselines.
arXiv Detail & Related papers (2024-06-20T06:42:08Z) - MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.<n>We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.<n>Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z) - A Dual-Prompting for Interpretable Mental Health Language Models [11.33857985668663]
The CLPsych 2024 Shared Task aims to enhance the interpretability of Large Language Models (LLMs)
We propose a dual-prompting approach: (i) Knowledge-aware evidence extraction by leveraging the expert identity and a suicide dictionary with a mental health-specific LLM; and (ii) summarization by employing an LLM-based consistency evaluator.
arXiv Detail & Related papers (2024-02-20T06:18:02Z) - MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation [60.65820977963331]
We introduce a novel evaluation paradigm for Large Language Models (LLMs)
This paradigm shifts the emphasis from result-oriented assessments, which often neglect the reasoning process, to a more comprehensive evaluation.
By applying this paradigm in the GSM8K dataset, we have developed the MR-GSM8K benchmark.
arXiv Detail & Related papers (2023-12-28T15:49:43Z) - PsyEval: A Suite of Mental Health Related Tasks for Evaluating Large Language Models [34.09419351705938]
This paper presents PsyEval, the first comprehensive suite of mental health-related tasks for evaluating Large Language Models (LLMs)
This comprehensive framework is designed to thoroughly assess the unique challenges and intricacies of mental health-related tasks.
arXiv Detail & Related papers (2023-11-15T18:32:27Z) - Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools.
Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions.
Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z) - Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate [85.3444184685235]
We propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of "tit for tat" and a judge manages the debate process to obtain a final solution.
Our framework encourages divergent thinking in LLMs which would be helpful for tasks that require deep levels of contemplation.
arXiv Detail & Related papers (2023-05-30T15:25:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.