Development and Evaluation of HopeBot: an LLM-based chatbot for structured and interactive PHQ-9 depression screening
- URL: http://arxiv.org/abs/2507.05984v1
- Date: Tue, 08 Jul 2025 13:41:22 GMT
- Title: Development and Evaluation of HopeBot: an LLM-based chatbot for structured and interactive PHQ-9 depression screening
- Authors: Zhijun Guo, Alvina Lai, Julia Ive, Alexandru Petcu, Yutong Wang, Luyuan Qi, Johan H Thygesen, Kezhi Li,
- Abstract summary: HopeBot administers the Patient Health Questionnaire-9 (PHQ-9) using retrieval-augmented generation and real-time clarification.<n>In a within-subject study, 132 adults in the United Kingdom and China completed both self-administered and chatbots versions.<n>Overall, 87.1% expressed willingness to reuse or recommend HopeBot.
- Score: 48.355615275247786
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Static tools like the Patient Health Questionnaire-9 (PHQ-9) effectively screen depression but lack interactivity and adaptability. We developed HopeBot, a chatbot powered by a large language model (LLM) that administers the PHQ-9 using retrieval-augmented generation and real-time clarification. In a within-subject study, 132 adults in the United Kingdom and China completed both self-administered and chatbot versions. Scores demonstrated strong agreement (ICC = 0.91; 45% identical). Among 75 participants providing comparative feedback, 71% reported greater trust in the chatbot, highlighting clearer structure, interpretive guidance, and a supportive tone. Mean ratings (0-10) were 8.4 for comfort, 7.7 for voice clarity, 7.6 for handling sensitive topics, and 7.4 for recommendation helpfulness; the latter varied significantly by employment status and prior mental-health service use (p < 0.05). Overall, 87.1% expressed willingness to reuse or recommend HopeBot. These findings demonstrate voice-based LLM chatbots can feasibly serve as scalable, low-burden adjuncts for routine depression screening.
Related papers
- Clean & Clear: Feasibility of Safe LLM Clinical Guidance [2.0194749607835014]
Clinical guidelines are central to safe evidence-based medicine in modern healthcare.<n>We developed an open-weight Llama-3.1-8B LLM to extract relevant information from the UCLH guidelines to answer questions.<n>73% of its responses rated as very relevant, showcasing a strong understanding of the clinical context.
arXiv Detail & Related papers (2025-03-26T19:36:43Z) - LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment.<n>We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews.<n>Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z) - Prompt Engineering a Schizophrenia Chatbot: Utilizing a Multi-Agent Approach for Enhanced Compliance with Prompt Instructions [0.0699049312989311]
Patients with schizophrenia often present with cognitive impairments that may hinder their ability to learn about their condition.
While Large Language Models (LLMs) have the potential to make topical mental health information more accessible and engaging, their black-box nature raises concerns about ethics and safety.
arXiv Detail & Related papers (2024-10-10T09:49:24Z) - Enhancing Depression Diagnosis with Chain-of-Thought Prompting [1.8532406942078647]
We theorize that using chain-of-thought (CoT) prompting to evaluate Patient Health Questionnaire-8 (PHQ-8) scores will improve the accuracy of the scores determined by AI models.
Our goal is to expand upon AI models' understanding of the intricacies of human conversation, allowing them to more effectively assess a patient's feelings and tone.
arXiv Detail & Related papers (2024-08-26T07:19:07Z) - How Reliable AI Chatbots are for Disease Prediction from Patient Complaints? [0.0]
This study examines the reliability of AI chatbots, specifically GPT 4.0, Claude 3 Opus, and Gemini Ultra 1.0, in predicting diseases from patient complaints in the emergency department.
Results suggest that GPT 4.0 achieves high accuracy with increased few-shot data, while Gemini Ultra 1.0 performs well with fewer examples, and Claude 3 Opus maintains consistent performance.
arXiv Detail & Related papers (2024-05-21T22:00:13Z) - From Words and Exercises to Wellness: Farsi Chatbot for Self-Attachment Technique [1.7592522344393486]
We develop a voice-capable bot in Farsi to guide users through Self-Attachment (SAT)
We collect a dataset of over 6,000 utterances and develop a novel sentiment-analysis module that classifies user sentiment into 12 classes, with accuracy above 92%.
Our platform was engaging to most users (75%), 72% felt better after the interactions, and 74% were satisfied with the SAT Teacher's performance.
arXiv Detail & Related papers (2023-10-13T19:09:31Z) - EmpBot: A T5-based Empathetic Chatbot focusing on Sentiments [75.11753644302385]
Empathetic conversational agents should not only understand what is being discussed, but also acknowledge the implied feelings of the conversation partner.
We propose a method based on a transformer pretrained language model (T5)
We evaluate our model on the EmpatheticDialogues dataset using both automated metrics and human evaluation.
arXiv Detail & Related papers (2021-10-30T19:04:48Z) - CheerBots: Chatbots toward Empathy and Emotionusing Reinforcement
Learning [60.348822346249854]
This study presents a framework whereby several empathetic chatbots are based on understanding users' implied feelings and replying empathetically for multiple dialogue turns.
We call these chatbots CheerBots. CheerBots can be retrieval-based or generative-based and were finetuned by deep reinforcement learning.
To respond in an empathetic way, we develop a simulating agent, a Conceptual Human Model, as aids for CheerBots in training with considerations on changes in user's emotional states in the future to arouse sympathy.
arXiv Detail & Related papers (2021-10-08T07:44:47Z) - Put Chatbot into Its Interlocutor's Shoes: New Framework to Learn
Chatbot Responding with Intention [55.77218465471519]
This paper proposes an innovative framework to train chatbots to possess human-like intentions.
Our framework included a guiding robot and an interlocutor model that plays the role of humans.
We examined our framework using three experimental setups and evaluate the guiding robot with four different metrics to demonstrated flexibility and performance advantages.
arXiv Detail & Related papers (2021-03-30T15:24:37Z) - NUVA: A Naming Utterance Verifier for Aphasia Treatment [49.114436579008476]
Assessment of speech performance using picture naming tasks is a key method for both diagnosis and monitoring of responses to treatment interventions by people with aphasia (PWA)
Here we present NUVA, an utterance verification system incorporating a deep learning element that classifies 'correct' versus'incorrect' naming attempts from aphasic stroke patients.
When tested on eight native British-English speaking PWA the system's performance accuracy ranged between 83.6% to 93.6%, with a 10-fold cross-validation mean of 89.5%.
arXiv Detail & Related papers (2021-02-10T13:00:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.