Multimodal Machine Learning in Mental Health: A Survey of Data, Algorithms, and Challenges
- URL: http://arxiv.org/abs/2407.16804v2
- Date: Tue, 24 Jun 2025 13:40:09 GMT
- Title: Multimodal Machine Learning in Mental Health: A Survey of Data, Algorithms, and Challenges
- Authors: Zahraa Al Sahili, Ioannis Patras, Matthew Purver,
- Abstract summary: Multimodal machine learning (MML) is rapidly reshaping the way mental-health disorders are detected, characterized, and longitudinally monitored.<n>This survey provides the first comprehensive, clinically grounded synthesis of MML for mental health.
- Score: 14.632649933582648
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal machine learning (MML) is rapidly reshaping the way mental-health disorders are detected, characterized, and longitudinally monitored. Whereas early studies relied on isolated data streams -- such as speech, text, or wearable signals -- recent research has converged on architectures that integrate heterogeneous modalities to capture the rich, complex signatures of psychiatric conditions. This survey provides the first comprehensive, clinically grounded synthesis of MML for mental health. We (i) catalog 26 public datasets spanning audio, visual, physiological signals, and text modalities; (ii) systematically compare transformer, graph, and hybrid-based fusion strategies across 28 models, highlighting trends in representation learning and cross-modal alignment. Beyond summarizing current capabilities, we interrogate open challenges: data governance and privacy, demographic and intersectional fairness, evaluation explainability, and the complexity of mental health disorders in multimodal settings. By bridging methodological innovation with psychiatric utility, this survey aims to orient both ML researchers and mental-health practitioners toward the next generation of trustworthy, multimodal decision-support systems.
Related papers
- Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications [59.721265428780946]
Large Language Models (LLMs) in medicine have enabled impressive capabilities, yet a critical gap remains in their ability to perform systematic, transparent, and verifiable reasoning.<n>This paper provides the first systematic review of this emerging field.<n>We propose a taxonomy of reasoning enhancement techniques, categorized into training-time strategies and test-time mechanisms.
arXiv Detail & Related papers (2025-08-01T14:41:31Z) - EEG Foundation Challenge: From Cross-Task to Cross-Subject EEG Decoding [71.31963197992998]
We introduce a large-scale, code-based competition comprising two challenges.<n>The Transfer Challenge asks participants to build and test a model that can zero-shot decode new tasks and new subjects from their EEG data.<n>The Psychopathology factor prediction Challenge asks participants to infer subject measures of mental health from EEG data.
arXiv Detail & Related papers (2025-06-23T21:25:19Z) - A Survey of Large Language Models in Mental Health Disorder Detection on Social Media [6.494919397864379]
This paper aims to explore the potential of Large Language Models (LLMs) for mental health problem detection on social media.
The paper focuses on the most common psychological disorders such as depression and anxiety but also incorporating psychotic disorders and externalizing disorders.
arXiv Detail & Related papers (2025-04-03T17:43:14Z) - Leveraging Embedding Techniques in Multimodal Machine Learning for Mental Illness Assessment [0.8458496687170665]
The increasing global prevalence of mental disorders, such as depression and PTSD, requires objective and scalable diagnostic tools.<n>This paper investigates the potential of multimodal machine learning to address these challenges, leveraging the complementary information available in text, audio, and video data.<n>We explore data-level, feature-level, and decision-level fusion techniques, including a novel integration of Large Language Model predictions.
arXiv Detail & Related papers (2025-04-02T14:19:06Z) - Early Detection of Mental Health Issues Using Social Media Posts [0.0]
Social media platforms, like Reddit, represent a rich source of user-generated content.<n>We propose a multi-modal deep learning framework that integrates linguistic and temporal features for early detection of mental health crises.
arXiv Detail & Related papers (2025-03-06T23:08:08Z) - A Survey of Large Language Models in Psychotherapy: Current Landscape and Future Directions [13.17064228097947]
Large Language Models (LLMs) offer potential for addressing this gap due to their ability to handle extensive context and multi-turn reasoning.<n>This review introduces a conceptual taxonomy dividing psychotherapy into interconnected stages--assessment, diagnosis, and treatment--to systematically examine LLM advancements and challenges.
arXiv Detail & Related papers (2025-02-16T12:18:40Z) - Multimodal Data-Driven Classification of Mental Disorders: A Comprehensive Approach to Diagnosing Depression, Anxiety, and Schizophrenia [0.9297614330263184]
This study investigates the potential of multimodal data integration to diagnose mental diseases like schizophrenia, depression, and anxiety.<n>Using Apache Spark and convolutional neural networks (CNNs), a data-driven classification pipeline has been developed for big data environment.<n>The importance of coherence features is highlighted by comparative analysis, which shows significant improvement in classification accuracy and robustness.
arXiv Detail & Related papers (2025-02-06T10:30:13Z) - Towards Privacy-aware Mental Health AI Models: Advances, Challenges, and Opportunities [61.633126163190724]
Mental illness is a widespread and debilitating condition with substantial societal and personal costs.
Recent advances in Artificial Intelligence (AI) hold great potential for recognizing and addressing conditions such as depression, anxiety disorder, bipolar disorder, schizophrenia, and post-traumatic stress disorder.
Privacy concerns, including the risk of sensitive data leakage from datasets and trained models, remain a critical barrier to deploying these AI systems in real-world clinical settings.
arXiv Detail & Related papers (2025-02-01T15:10:02Z) - Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning [151.4060202671114]
multimodal large language models (MLLMs) have shown unprecedented capabilities in advancing vision-language tasks.
This paper introduces a novel bottom-up reasoning framework to address hallucinations in MLLMs.
Our framework systematically addresses potential issues in both visual and textual inputs by verifying and integrating perception-level information with cognition-level commonsense knowledge.
arXiv Detail & Related papers (2024-12-15T09:10:46Z) - Automated Multi-Label Annotation for Mental Health Illnesses Using Large Language Models [0.9913418444556487]
Mental health disorders, such as depression and Anxiety, often co-occur.<n>Social media datasets typically focus on single-disorder labels.<n>This paper proposes a novel methodology for cleaning, sampling, labeling, and combining data to create versatile multi-label datasets.
arXiv Detail & Related papers (2024-12-05T01:33:03Z) - SouLLMate: An Application Enhancing Diverse Mental Health Support with Adaptive LLMs, Prompt Engineering, and RAG Techniques [9.146311285410631]
Mental health issues significantly impact individuals' daily lives, yet many do not receive the help they need even with available online resources.
This study aims to provide diverse, accessible, stigma-free, personalized, and real-time mental health support through cutting-edge AI technologies.
arXiv Detail & Related papers (2024-10-17T22:04:32Z) - From Text to Multimodality: Exploring the Evolution and Impact of Large Language Models in Medical Practice [12.390859712280328]
Large Language Models (LLMs) have rapidly evolved from text-based systems to multimodal platforms.
We examine the current landscape of MLLMs in healthcare, analyzing their applications across clinical decision support, medical imaging, patient engagement, and research.
arXiv Detail & Related papers (2024-09-14T02:35:29Z) - Explainable AI for Mental Disorder Detection via Social Media: A survey and outlook [0.7689629183085726]
We conduct a thorough survey to explore the intersection of data science, artificial intelligence, and mental healthcare.
A significant portion of the population actively engages in online social media platforms, creating a vast repository of personal data.
The paper navigates through traditional diagnostic methods, state-of-the-art data- and AI-driven research studies, and the emergence of explainable AI (XAI) models for mental healthcare.
arXiv Detail & Related papers (2024-06-10T02:51:16Z) - Optimizing Skin Lesion Classification via Multimodal Data and Auxiliary
Task Integration [54.76511683427566]
This research introduces a novel multimodal method for classifying skin lesions, integrating smartphone-captured images with essential clinical and demographic information.
A distinctive aspect of this method is the integration of an auxiliary task focused on super-resolution image prediction.
The experimental evaluations have been conducted using the PAD-UFES20 dataset, applying various deep-learning architectures.
arXiv Detail & Related papers (2024-02-16T05:16:20Z) - Recent Advances in Hate Speech Moderation: Multimodality and the Role of Large Models [52.24001776263608]
This comprehensive survey delves into the recent strides in HS moderation.
We highlight the burgeoning role of large language models (LLMs) and large multimodal models (LMMs)
We identify existing gaps in research, particularly in the context of underrepresented languages and cultures.
arXiv Detail & Related papers (2024-01-30T03:51:44Z) - Challenges of Large Language Models for Mental Health Counseling [4.604003661048267]
The global mental health crisis is looming with a rapid increase in mental disorders, limited resources, and the social stigma of seeking treatment.
The application of large language models (LLMs) in the mental health domain raises concerns regarding the accuracy, effectiveness, and reliability of the information provided.
This paper investigates the major challenges associated with the development of LLMs for psychological counseling, including model hallucination, interpretability, bias, privacy, and clinical effectiveness.
arXiv Detail & Related papers (2023-11-23T08:56:41Z) - Towards Mitigating Hallucination in Large Language Models via
Self-Reflection [63.2543947174318]
Large language models (LLMs) have shown promise for generative and knowledge-intensive tasks including question-answering (QA) tasks.
This paper analyses the phenomenon of hallucination in medical generative QA systems using widely adopted LLMs and datasets.
arXiv Detail & Related papers (2023-10-10T03:05:44Z) - Incomplete Multimodal Learning for Complex Brain Disorders Prediction [65.95783479249745]
We propose a new incomplete multimodal data integration approach that employs transformers and generative adversarial networks.
We apply our new method to predict cognitive degeneration and disease outcomes using the multimodal imaging genetic data from Alzheimer's Disease Neuroimaging Initiative cohort.
arXiv Detail & Related papers (2023-05-25T16:29:16Z) - MEDUSA: Multi-scale Encoder-Decoder Self-Attention Deep Neural Network
Architecture for Medical Image Analysis [71.2022403915147]
We introduce MEDUSA, a multi-scale encoder-decoder self-attention mechanism tailored for medical image analysis.
We obtain state-of-the-art performance on challenging medical image analysis benchmarks including COVIDx, RSNA RICORD, and RSNA Pneumonia Challenge.
arXiv Detail & Related papers (2021-10-12T15:05:15Z) - MIMO: Mutual Integration of Patient Journey and Medical Ontology for
Healthcare Representation Learning [49.57261599776167]
We propose an end-to-end robust Transformer-based solution, Mutual Integration of patient journey and Medical Ontology (MIMO) for healthcare representation learning and predictive analytics.
arXiv Detail & Related papers (2021-07-20T07:04:52Z) - Deep Recurrent Encoder: A scalable end-to-end network to model brain
signals [122.1055193683784]
We propose an end-to-end deep learning architecture trained to predict the brain responses of multiple subjects at once.
We successfully test this approach on a large cohort of magnetoencephalography (MEG) recordings acquired during a one-hour reading task.
arXiv Detail & Related papers (2021-03-03T11:39:17Z) - Learning Binary Semantic Embedding for Histology Image Classification
and Retrieval [56.34863511025423]
We propose a novel method for Learning Binary Semantic Embedding (LBSE)
Based on the efficient and effective embedding, classification and retrieval are performed to provide interpretable computer-assisted diagnosis for histology images.
Experiments conducted on three benchmark datasets validate the superiority of LBSE under various scenarios.
arXiv Detail & Related papers (2020-10-07T08:36:44Z) - Machine Learning in Nano-Scale Biomedical Engineering [77.75587007080894]
We review the existing research regarding the use of machine learning in nano-scale biomedical engineering.
The main challenges that can be formulated as ML problems are classified into the three main categories.
For each of the presented methodologies, special emphasis is given to its principles, applications, and limitations.
arXiv Detail & Related papers (2020-08-05T15:45:54Z) - Hierarchical Reinforcement Learning for Automatic Disease Diagnosis [52.111516253474285]
We propose to integrate a hierarchical policy structure of two levels into the dialogue systemfor policy learning.
The proposed policy structure is capable to deal with diagnosis problem including large number of diseases and symptoms.
arXiv Detail & Related papers (2020-04-29T15:02:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.