Related papers: Foundation Model-based Evaluation of Neuropsychiatric Disorders: A Lifespan-Inclusive, Multi-Modal, and Multi-Lingual Study

Foundation Model-based Evaluation of Neuropsychiatric Disorders: A Lifespan-Inclusive, Multi-Modal, and Multi-Lingual Study

URL: http://arxiv.org/abs/2512.20948v1
Date: Wed, 24 Dec 2025 05:07:07 GMT
Title: Foundation Model-based Evaluation of Neuropsychiatric Disorders: A Lifespan-Inclusive, Multi-Modal, and Multi-Lingual Study
Authors: Zhongren Dong, Haotian Guo, Weixiang Xu, Huan Zhao, Zixing Zhang,
Abstract summary: Neuropsychiatric disorders, such as Alzheimer's disease (AD), depression, and autism spectrum disorder (ASD), are characterized by linguistic and acoustic abnormalities.<n>We propose FEND (Foundation model-based Evaluation of Neuropsychiatric Disorders), a comprehensive multi-modal framework integrating speech and text modalities for detecting AD, depression, and ASD across the lifespan.
Score: 18.4135590766724
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neuropsychiatric disorders, such as Alzheimer's disease (AD), depression, and autism spectrum disorder (ASD), are characterized by linguistic and acoustic abnormalities, offering potential biomarkers for early detection. Despite the promise of multi-modal approaches, challenges like multi-lingual generalization and the absence of a unified evaluation framework persist. To address these gaps, we propose FEND (Foundation model-based Evaluation of Neuropsychiatric Disorders), a comprehensive multi-modal framework integrating speech and text modalities for detecting AD, depression, and ASD across the lifespan. Leveraging 13 multi-lingual datasets spanning English, Chinese, Greek, French, and Dutch, we systematically evaluate multi-modal fusion performance. Our results show that multi-modal fusion excels in AD and depression detection but underperforms in ASD due to dataset heterogeneity. We also identify modality imbalance as a prevalent issue, where multi-modal fusion fails to surpass the best mono-modal models. Cross-corpus experiments reveal robust performance in task- and language-consistent scenarios but noticeable degradation in multi-lingual and task-heterogeneous settings. By providing extensive benchmarks and a detailed analysis of performance-influencing factors, FEND advances the field of automated, lifespan-inclusive, and multi-lingual neuropsychiatric disorder assessment. We encourage researchers to adopt the FEND framework for fair comparisons and reproducible research.

Related papers

Cross-Linguistic Persona-Driven Data Synthesis for Robust Multimodal Cognitive Decline Detection [20.599682298329213]
We introduce SynCog, a novel framework integrating controllable zero-shot multimodal data synthesis with Chain-of-Thought deduction fine-tuning.<n>This generative paradigm enables the rapid, zero-shot expansion of clinical corpora across diverse languages.<n>Experiments on the ADReSS and ADReSSo benchmarks demonstrate that augmenting limited clinical data with synthetic phenotypes yields competitive diagnostic performance.
arXiv Detail & Related papers (2026-02-08T14:10:05Z)
R-GenIMA: Integrating Neuroimaging and Genetics with Interpretable Multimodal AI for Alzheimer's Disease Progression [63.97617759805451]
Early detection of Alzheimer's disease requires models capable of integrating macro-scale neuroanatomical alterations with micro-scale genetic susceptibility.<n>We introduce R-GenIMA, an interpretable multimodal large language model that couples a novel ROI-wise vision transformer with genetic prompting.<n>R-GenIMA achieves state-of-the-art performance in four-way classification across normal cognition, subjective memory concerns, mild cognitive impairment, and AD.
arXiv Detail & Related papers (2025-12-22T02:54:10Z)
MedAlign: A Synergistic Framework of Multimodal Preference Optimization and Federated Meta-Cognitive Reasoning [52.064286116035134]
We develop MedAlign, a framework to ensure visually accurate LVLM responses for Medical Visual Question Answering (Med-VQA)<n>We first propose a multimodal Direct Preference Optimization (mDPO) objective to align preference learning with visual context.<n>We then design a Retrieval-Aware Mixture-of-Experts (RA-MoE) architecture that utilizes image and text similarity to route queries to a specialized and context-augmented LVLM.
arXiv Detail & Related papers (2025-10-24T02:11:05Z)
Cross-modal Causal Intervention for Alzheimer's Disease Prediction [13.584994367762398]
We propose a visual-language causality-inspired framework named Cross-modal Causal Intervention with Mediator for Alzheimer's Disease Diagnosis (MediAD)<n>Our framework implicitly mitigates the effect of both observable and unobservable confounders through a unified causal intervention method.
arXiv Detail & Related papers (2025-07-18T14:21:24Z)
Anomaly Detection and Generation with Diffusion Models: A Survey [51.61574868316922]
Anomaly detection (AD) plays a pivotal role across diverse domains, including cybersecurity, finance, healthcare, and industrial manufacturing.<n>Recent advancements in deep learning, specifically diffusion models (DMs), have sparked significant interest.<n>This survey aims to guide researchers and practitioners in leveraging DMs for innovative AD solutions across diverse applications.
arXiv Detail & Related papers (2025-06-11T03:29:18Z)
A Layered Multi-Expert Framework for Long-Context Mental Health Assessments [9.095637530998134]
Stacked Multi-Model Reasoning (SMMR) is a layered framework that leverages multiple models as coequal 'experts'<n>We evaluate SMMR on the DAIC-WOZ depression-screening dataset and 48 curated case studies with psychiatric diagnoses.<n>By harnessing diverse'second opinions', SMMR mitigates hallucinations, captures subtle clinical nuances, and enhances reliability in high-stakes mental health assessments.
arXiv Detail & Related papers (2025-01-20T03:22:19Z)
LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment.<n>We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews.<n>Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z)
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio [118.75449542080746]
This paper presents the first systematic investigation of hallucinations in large multimodal models (LMMs) Our study reveals two key contributors to hallucinations: overreliance on unimodal priors and spurious inter-modality correlations. Our findings highlight key vulnerabilities, including imbalances in modality integration and biases from training data, underscoring the need for balanced cross-modal learning.
arXiv Detail & Related papers (2024-10-16T17:59:02Z)
Multimodal Audio-based Disease Prediction with Transformer-based Hierarchical Fusion Network [6.175036031779841]
Multimodal fusion has proven effective in enhancing diagnostic performance.<n>We propose a transformer-based hierarchical fusion network designed for general multimodal audio-based disease prediction.<n>Our model achieves state-of-the-art performance in predicting three diseases: COVID-19, Parkinson's disease, and pathological dysarthria.
arXiv Detail & Related papers (2024-10-11T22:37:52Z)
An interpretable generative multimodal neuroimaging-genomics framework for decoding Alzheimer's disease [13.213387075528017]
Alzheimer's disease (AD) is the most prevalent form of dementia worldwide, encompassing a prodromal stage known as Mild Cognitive Impairment (MCI)<n>The objective of the work was to capture structural and functional modulations of brain structure and function relying on multimodal MRI data and Single Nucleotide Polymorphisms.
arXiv Detail & Related papers (2024-06-19T07:31:47Z)
Cognitive Insights Across Languages: Enhancing Multimodal Interview Analysis [0.6062751776009752]
We propose a multimodal model capable of predicting Mild Cognitive Impairment and cognitive scores. The proposed model demonstrates the ability to transcribe and differentiate between languages used in the interviews. Our approach involves in-depth research to implement various features obtained from the proposed modalities.
arXiv Detail & Related papers (2024-06-11T17:59:31Z)
Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults. Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations. This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.