Aligning Large Language Models for Enhancing Psychiatric Interviews Through Symptom Delineation and Summarization: Pilot Study
- URL: http://arxiv.org/abs/2403.17428v2
- Date: Mon, 10 Feb 2025 13:28:01 GMT
- Title: Aligning Large Language Models for Enhancing Psychiatric Interviews Through Symptom Delineation and Summarization: Pilot Study
- Authors: Jae-hee So, Joonhwan Chang, Eunji Kim, Junho Na, JiYeon Choi, Jy-yong Sohn, Byung-Hoon Kim, Sang Hui Chu,
- Abstract summary: This study focuses on enhancing psychiatric interviews by analyzing counseling data from North Korean defectors.
The study investigates whether LLMs can (1) identify parts of conversations that suggest psychiatric symptoms and recognize those symptoms, and (2) summarize stressors and symptoms based on interview transcripts.
- Score: 13.77580842967173
- License:
- Abstract: Background: Advancements in large language models (LLMs) have opened new possibilities in psychiatric interviews, an underexplored area where LLMs could be valuable. This study focuses on enhancing psychiatric interviews by analyzing counseling data from North Korean defectors who have experienced trauma and mental health issues. Objective: The study investigates whether LLMs can (1) identify parts of conversations that suggest psychiatric symptoms and recognize those symptoms, and (2) summarize stressors and symptoms based on interview transcripts. Methods: LLMs are tasked with (1) extracting stressors from transcripts, (2) identifying symptoms and their corresponding sections, and (3) generating interview summaries using the extracted data. The transcripts were labeled by mental health experts for training and evaluation. Results: In the zero-shot inference setting using GPT-4 Turbo, 73 out of 102 segments demonstrated a recall mid-token distance d < 20 in identifying symptom-related sections. For recognizing specific symptoms, fine-tuning outperformed zero-shot inference, achieving an accuracy, precision, recall, and F1-score of 0.82. For the generative summarization task, LLMs using symptom and stressor information scored highly on G-Eval metrics: coherence (4.66), consistency (4.73), fluency (2.16), and relevance (4.67). Retrieval-augmented generation showed no notable performance improvement. Conclusions: LLMs, with fine-tuning or appropriate prompting, demonstrated strong accuracy (over 0.8) for symptom delineation and achieved high coherence (4.6+) in summarization. This study highlights their potential to assist mental health practitioners in analyzing psychiatric interviews.
Related papers
- Enhanced Large Language Models for Effective Screening of Depression and Anxiety [44.81045754697482]
This paper introduces a pipeline for synthesizing clinical interviews, resulting in 1,157 interactive dialogues (PsyInterview)
EmoScan distinguishes between coarse (e.g., anxiety or depressive disorders) and fine disorders (e.g., major depressive disorders) and conducts high-quality interviews.
arXiv Detail & Related papers (2025-01-15T12:42:09Z) - Investigating Large Language Models in Inferring Personality Traits from User Conversations [5.705775078773656]
Large Language Models (LLMs) are demonstrating remarkable human like capabilities across diverse domains.
This study evaluates whether GPT-4o and GPT-4o mini, can infer Big Five personality traits and generate BFI-10 item scores from user conversations.
arXiv Detail & Related papers (2025-01-13T18:09:58Z) - LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment.
We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews.
Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z) - LLM Internal States Reveal Hallucination Risk Faced With a Query [62.29558761326031]
Humans have a self-awareness process that allows us to recognize what we don't know when faced with queries.
This paper investigates whether Large Language Models can estimate their own hallucination risk before response generation.
By a probing estimator, we leverage LLM self-assessment, achieving an average hallucination estimation accuracy of 84.32% at run time.
arXiv Detail & Related papers (2024-07-03T17:08:52Z) - WellDunn: On the Robustness and Explainability of Language Models and Large Language Models in Identifying Wellness Dimensions [46.60244609728416]
Language Models (LMs) are being proposed for mental health applications where the heightened risk of adverse outcomes means predictive performance may not be a litmus test of a model's utility in clinical practice.
We introduce an evaluation design that focuses on the robustness and explainability of LMs in identifying Wellness Dimensions (WDs)
We reveal four surprising results about LMs/LLMs.
arXiv Detail & Related papers (2024-06-17T19:50:40Z) - LLM Questionnaire Completion for Automatic Psychiatric Assessment [49.1574468325115]
We employ a Large Language Model (LLM) to convert unstructured psychological interviews into structured questionnaires spanning various psychiatric and personality domains.
The obtained answers are coded as features, which are used to predict standardized psychiatric measures of depression (PHQ-8) and PTSD (PCL-C)
arXiv Detail & Related papers (2024-06-09T09:03:11Z) - MentaLLaMA: Interpretable Mental Health Analysis on Social Media with
Large Language Models [28.62967557368565]
We build the first multi-task and multi-source interpretable mental health instruction dataset on social media, with 105K data samples.
We use expert-written few-shot prompts and collected labels to prompt ChatGPT and obtain explanations from its responses.
Based on the IMHI dataset and LLaMA2 foundation models, we train MentalLLaMA, the first open-source LLM series for interpretable mental health analysis.
arXiv Detail & Related papers (2023-09-24T06:46:08Z) - Automatically measuring speech fluency in people with aphasia: first
achievements using read-speech data [55.84746218227712]
This study aims at assessing the relevance of a signalprocessingalgorithm, initially developed in the field of language acquisition, for the automatic measurement of speech fluency.
arXiv Detail & Related papers (2023-08-09T07:51:40Z) - Towards Interpretable Mental Health Analysis with Large Language Models [27.776003210275608]
We evaluate the mental health analysis and emotional reasoning ability of large language models (LLMs) on 11 datasets across 5 tasks.
Based on prompts, we explore LLMs for interpretable mental health analysis by instructing them to generate explanations for each of their decisions.
We convey strict human evaluations to assess the quality of the generated explanations, leading to a novel dataset with 163 human-assessed explanations.
arXiv Detail & Related papers (2023-04-06T19:53:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.