Related papers: DAIC-WOZ: On the Validity of Using the Therapist's prompts in Automatic Depression Detection from Clinical Interviews

DAIC-WOZ: On the Validity of Using the Therapist's prompts in Automatic Depression Detection from Clinical Interviews

URL: http://arxiv.org/abs/2404.14463v1
Date: Mon, 22 Apr 2024 09:07:50 GMT
Title: DAIC-WOZ: On the Validity of Using the Therapist's prompts in Automatic Depression Detection from Clinical Interviews
Authors: Sergio Burdisso, Ernesto Reyes-Ramírez, Esaú Villatoro-Tello, Fernando Sánchez-Vega, Pastor López-Monroy, Petr Motlicek,
Abstract summary: Recent studies have reported enhanced performance when incorporating interviewer's prompts into the model. We discover that models using interviewer's prompts learn to focus on a specific region of the interviews, where questions about past experiences with mental health issues are asked. We achieve a 0.90 F1 score by intentionally exploiting it, the highest result reported to date on this dataset using only textual information.
Score: 39.08557916089242
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automatic depression detection from conversational data has gained significant interest in recent years. The DAIC-WOZ dataset, interviews conducted by a human-controlled virtual agent, has been widely used for this task. Recent studies have reported enhanced performance when incorporating interviewer's prompts into the model. In this work, we hypothesize that this improvement might be mainly due to a bias present in these prompts, rather than the proposed architectures and methods. Through ablation experiments and qualitative analysis, we discover that models using interviewer's prompts learn to focus on a specific region of the interviews, where questions about past experiences with mental health issues are asked, and use them as discriminative shortcuts to detect depressed participants. In contrast, models using participant responses gather evidence from across the entire interview. Finally, to highlight the magnitude of this bias, we achieve a 0.90 F1 score by intentionally exploiting it, the highest result reported to date on this dataset using only textual information. Our findings underline the need for caution when incorporating interviewers' prompts into models, as they may inadvertently learn to exploit targeted prompts, rather than learning to characterize the language and behavior that are genuinely indicative of the patient's mental health condition.

Related papers

CognoSpeak: an automatic, remote assessment of early cognitive decline in real-world conversational speech [13.530437810265814]
This paper presents CognoSpeak and its associated data collection efforts. CognoSpeak asks memory-probing long and short-term questions and administers cognitive tasks using a virtual agent on a mobile or web platform. It collects multimodal data such as audio and video along with a rich set of metadata from primary and secondary care, memory clinics and remote settings like people's homes.
arXiv Detail & Related papers (2025-01-10T07:13:42Z)
LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment. We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews. Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z)
Pointwise Mutual Information as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
We show that the pointwise mutual information between a context and a question is an effective gauge for language model performance. We propose two methods that use the pointwise mutual information between a document and a question as a gauge for selecting and constructing prompts that lead to better performance.
arXiv Detail & Related papers (2024-11-12T13:14:09Z)
A BERT-Based Summarization approach for depression detection [1.7363112470483526]
Depression is a globally prevalent mental disorder with potentially severe repercussions if not addressed. Machine learning and artificial intelligence can autonomously detect depression indicators from diverse data sources. Our study proposes text summarization as a preprocessing technique to diminish the length and intricacies of input texts.
arXiv Detail & Related papers (2024-09-13T02:14:34Z)
HiQuE: Hierarchical Question Embedding Network for Multimodal Depression Detection [11.984035389013426]
HiQuE is a novel depression detection framework that leverages the hierarchical relationship between primary and follow-up questions in clinical interviews. We conduct extensive experiments on the widely-used clinical interview data, DAIC-WOZ, where our model outperforms other state-of-the-art multimodal depression detection models.
arXiv Detail & Related papers (2024-08-07T09:23:01Z)
Cognitive Insights Across Languages: Enhancing Multimodal Interview Analysis [0.6062751776009752]
We propose a multimodal model capable of predicting Mild Cognitive Impairment and cognitive scores. The proposed model demonstrates the ability to transcribe and differentiate between languages used in the interviews. Our approach involves in-depth research to implement various features obtained from the proposed modalities.
arXiv Detail & Related papers (2024-06-11T17:59:31Z)
LLM Questionnaire Completion for Automatic Psychiatric Assessment [49.1574468325115]
We employ a Large Language Model (LLM) to convert unstructured psychological interviews into structured questionnaires spanning various psychiatric and personality domains. The obtained answers are coded as features, which are used to predict standardized psychiatric measures of depression (PHQ-8) and PTSD (PCL-C)
arXiv Detail & Related papers (2024-06-09T09:03:11Z)
Modeling User Preferences via Brain-Computer Interfacing [54.3727087164445]
We use Brain-Computer Interfacing technology to infer users' preferences, their attentional correlates towards visual content, and their associations with affective experience. We link these to relevant applications, such as information retrieval, personalized steering of generative models, and crowdsourcing population estimates of affective experiences.
arXiv Detail & Related papers (2024-05-15T20:41:46Z)
From Classification to Clinical Insights: Towards Analyzing and Reasoning About Mobile and Behavioral Health Data With Large Language Models [21.427976533706737]
We take a novel approach that leverages large language models to synthesize clinically useful insights from multi-sensor data. We develop chain of thought prompting methods that use LLMs to generate reasoning about how trends in data relate to conditions like depression and anxiety. We find models like GPT-4 correctly reference numerical data 75% of the time, and clinician participants express strong interest in using this approach to interpret self-tracking data.
arXiv Detail & Related papers (2023-11-21T23:53:27Z)
Evaluating Subjective Cognitive Appraisals of Emotions from Large Language Models [47.890846082224066]
This work fills the gap by presenting CovidET-Appraisals, the most comprehensive dataset to-date that assesses 24 appraisal dimensions. CovidET-Appraisals presents an ideal testbed to evaluate the ability of large language models to automatically assess and explain cognitive appraisals.
arXiv Detail & Related papers (2023-10-22T19:12:17Z)
Explainable Depression Symptom Detection in Social Media [2.677715367737641]
We propose using transformer-based architectures to detect and explain the appearance of depressive symptom markers in the users' writings. Our natural language explanations enable clinicians to interpret the models' decisions based on validated symptoms.
arXiv Detail & Related papers (2023-10-20T17:05:27Z)
Towards Mitigating Hallucination in Large Language Models via Self-Reflection [63.2543947174318]
Large language models (LLMs) have shown promise for generative and knowledge-intensive tasks including question-answering (QA) tasks. This paper analyses the phenomenon of hallucination in medical generative QA systems using widely adopted LLMs and datasets.
arXiv Detail & Related papers (2023-10-10T03:05:44Z)
Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals [53.484562601127195]
We point out the inability to infer behavioral conclusions from probing results. We offer an alternative method that focuses on how the information is being used, rather than on what information is encoded.
arXiv Detail & Related papers (2020-06-01T15:00:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.