Speech as a Multimodal Digital Phenotype for Multi-Task LLM-based Mental Health Prediction
- URL: http://arxiv.org/abs/2505.23822v3
- Date: Wed, 23 Jul 2025 14:16:49 GMT
- Title: Speech as a Multimodal Digital Phenotype for Multi-Task LLM-based Mental Health Prediction
- Authors: Mai Ali, Christopher Lucasius, Tanmay P. Patel, Madison Aitken, Jacob Vorstman, Peter Szatmari, Marco Battaglia, Deepa Kundur,
- Abstract summary: We propose the treatment of patient speech data as a trimodal multimedia data source for depression detection.<n>Our proposed approach, featuring trimodal, longitudinal MTL is evaluated on the Depression Early Warning dataset.<n>It achieves a balanced accuracy of 70.8%, which is higher than each of the unimodal, single-task, and non-longitudinal methods.
- Score: 0.4517077427559345
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Speech is a noninvasive digital phenotype that can offer valuable insights into mental health conditions, but it is often treated as a single modality. In contrast, we propose the treatment of patient speech data as a trimodal multimedia data source for depression detection. This study explores the potential of large language model-based architectures for speech-based depression prediction in a multimodal regime that integrates speech-derived text, acoustic landmarks, and vocal biomarkers. Adolescent depression presents a significant challenge and is often comorbid with multiple disorders, such as suicidal ideation and sleep disturbances. This presents an additional opportunity to integrate multi-task learning (MTL) into our study by simultaneously predicting depression, suicidal ideation, and sleep disturbances using the multimodal formulation. We also propose a longitudinal analysis strategy that models temporal changes across multiple clinical interactions, allowing for a comprehensive understanding of the conditions' progression. Our proposed approach, featuring trimodal, longitudinal MTL is evaluated on the Depression Early Warning dataset. It achieves a balanced accuracy of 70.8%, which is higher than each of the unimodal, single-task, and non-longitudinal methods.
Related papers
- Generating Medically-Informed Explanations for Depression Detection using LLMs [1.325953054381901]
Early detection of depression from social media data offers a valuable opportunity for timely intervention.<n>We propose LLM-MTD (Large Language Model for Multi-Task Depression Detection), a novel approach that combines the power of large language models with the crucial aspect of explainability.
arXiv Detail & Related papers (2025-03-18T19:23:22Z) - Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions.<n>We propose a novel approach utilizing structured medical reasoning.<n>Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z) - Innovative Framework for Early Estimation of Mental Disorder Scores to Enable Timely Interventions [0.9297614330263184]
An advanced multimodal deep learning system for the automated classification of PTSD and depression is presented in this paper.<n>The proposed method achieves classification accuracies of 92% for depression and 93% for PTSD, outperforming traditional unimodal approaches.
arXiv Detail & Related papers (2025-02-06T10:57:10Z) - LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment.<n>We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews.<n>Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z) - Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities [25.305909441170993]
Depression has proven to be a significant public health issue, profoundly affecting the psychological well-being of individuals.<n>If it remains undiagnosed, depression can lead to severe health issues, which can manifest physically and even lead to suicide.
arXiv Detail & Related papers (2024-07-08T17:00:51Z) - Empowering Psychotherapy with Large Language Models: Cognitive
Distortion Detection through Diagnosis of Thought Prompting [82.64015366154884]
We study the task of cognitive distortion detection and propose the Diagnosis of Thought (DoT) prompting.
DoT performs diagnosis on the patient's speech via three stages: subjectivity assessment to separate the facts and the thoughts; contrastive reasoning to elicit the reasoning processes supporting and contradicting the thoughts; and schema analysis to summarize the cognition schemas.
Experiments demonstrate that DoT obtains significant improvements over ChatGPT for cognitive distortion detection, while generating high-quality rationales approved by human experts.
arXiv Detail & Related papers (2023-10-11T02:47:21Z) - Individualized Dosing Dynamics via Neural Eigen Decomposition [51.62933814971523]
We introduce the Neural Eigen Differential Equation algorithm (NESDE)
NESDE provides individualized modeling, tunable generalization to new treatment policies, and fast, continuous, closed-form prediction.
We demonstrate the robustness of NESDE in both synthetic and real medical problems, and use the learned dynamics to publish simulated medical gym environments.
arXiv Detail & Related papers (2023-06-24T17:01:51Z) - Speech and the n-Back task as a lens into depression. How combining both
may allow us to isolate different core symptoms of depression [12.251313610613693]
We show that speech alterations are more strongly associated with subsets of key depression symptoms.
We present a novel large, cross-sectional, multi-modal dataset collected at Thymia.
We then present a set of experiments that highlight the association between different speech and n-Back markers at the PHQ-8 item level.
arXiv Detail & Related papers (2022-03-30T09:12:59Z) - Deep Multi-task Learning for Depression Detection and Prediction in
Longitudinal Data [50.02223091927777]
Depression is among the most prevalent mental disorders, affecting millions of people of all ages globally.
Machine learning techniques have shown effective in enabling automated detection and prediction of depression for early intervention and treatment.
We introduce a novel deep multi-task recurrent neural network to tackle this challenge, in which depression classification is jointly optimized with two auxiliary tasks.
arXiv Detail & Related papers (2020-12-05T05:14:14Z) - Multimodal Depression Severity Prediction from medical bio-markers using
Machine Learning Tools and Technologies [0.0]
Depression has been a leading cause of mental-health illnesses across the world.
Using behavioural cues to automate depression diagnosis and stage prediction in recent years has relatively increased.
The absence of labelled behavioural datasets and a vast amount of possible variations prove to be a major challenge in accomplishing the task.
arXiv Detail & Related papers (2020-09-11T20:44:28Z) - Detecting Parkinsonian Tremor from IMU Data Collected In-The-Wild using
Deep Multiple-Instance Learning [59.74684475991192]
Parkinson's Disease (PD) is a slowly evolving neuro-logical disease that affects about 1% of the population above 60 years old.
PD symptoms include tremor, rigidity and braykinesia.
We present a method for automatically identifying tremorous episodes related to PD, based on IMU signals captured via a smartphone device.
arXiv Detail & Related papers (2020-05-06T09:02:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.