A Layered Multi-Expert Framework for Long-Context Mental Health Assessments
- URL: http://arxiv.org/abs/2501.13951v3
- Date: Fri, 19 Sep 2025 17:50:58 GMT
- Title: A Layered Multi-Expert Framework for Long-Context Mental Health Assessments
- Authors: Jinwen Tang, Qiming Guo, Wenbo Sun, Yi Shang,
- Abstract summary: Stacked Multi-Model Reasoning (SMMR) is a layered framework that leverages multiple models as coequal 'experts'<n>We evaluate SMMR on the DAIC-WOZ depression-screening dataset and 48 curated case studies with psychiatric diagnoses.<n>By harnessing diverse'second opinions', SMMR mitigates hallucinations, captures subtle clinical nuances, and enhances reliability in high-stakes mental health assessments.
- Score: 9.095637530998134
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Long-form mental health assessments pose unique challenges for large language models (LLMs), which often exhibit hallucinations or inconsistent reasoning when handling extended, domain-specific contexts. We introduce Stacked Multi-Model Reasoning (SMMR), a layered framework that leverages multiple LLMs and specialized smaller models as coequal 'experts'. Early layers isolate short, discrete subtasks, while later layers integrate and refine these partial outputs through more advanced long-context models. We evaluate SMMR on the DAIC-WOZ depression-screening dataset and 48 curated case studies with psychiatric diagnoses, demonstrating consistent improvements over single-model baselines in terms of accuracy, F1-score, and PHQ-8 error reduction. By harnessing diverse 'second opinions', SMMR mitigates hallucinations, captures subtle clinical nuances, and enhances reliability in high-stakes mental health assessments. Our findings underscore the value of multi-expert frameworks for more trustworthy AI-driven screening.
Related papers
- RE-MCDF: Closed-Loop Multi-Expert LLM Reasoning for Knowledge-Grounded Clinical Diagnosis [11.973474883672282]
We propose RE-MCDF, a relation-enhanced multi-expert clinical diagnosis framework.<n>We show that RE-MCDF consistently outperforms state-of-the-art baselines in complex diagnostic scenarios.
arXiv Detail & Related papers (2026-02-01T15:53:27Z) - MMedExpert-R1: Strengthening Multimodal Medical Reasoning via Domain-Specific Adaptation and Clinical Guideline Reinforcement [63.82954136824963]
Medical Vision-Language Models excel at perception tasks with complex clinical reasoning required in real-world scenarios.<n>We propose a novel reasoning MedVLM that addresses these challenges through domain-specific adaptation and guideline reinforcement.
arXiv Detail & Related papers (2026-01-16T02:32:07Z) - M3CoTBench: Benchmark Chain-of-Thought of MLLMs in Medical Image Understanding [66.78251988482222]
Chain-of-Thought (CoT) reasoning has proven effective in enhancing large language models by encouraging step-by-step intermediate reasoning.<n>Current benchmarks for medical image understanding generally focus on the final answer while ignoring the reasoning path.<n>M3CoTBench aims to foster the development of transparent, trustworthy, and diagnostically accurate AI systems for healthcare.
arXiv Detail & Related papers (2026-01-13T17:42:27Z) - Aligning Findings with Diagnosis: A Self-Consistent Reinforcement Learning Framework for Trustworthy Radiology Reporting [37.57009831483529]
Multimodal Large Language Models (MLLMs) have shown strong potential for radiology report generation.<n>Our framework restructures generation into two distinct components: a think block for detailed findings and an answer block for structured disease labels.
arXiv Detail & Related papers (2026-01-06T14:17:44Z) - Foundation Model-based Evaluation of Neuropsychiatric Disorders: A Lifespan-Inclusive, Multi-Modal, and Multi-Lingual Study [18.4135590766724]
Neuropsychiatric disorders, such as Alzheimer's disease (AD), depression, and autism spectrum disorder (ASD), are characterized by linguistic and acoustic abnormalities.<n>We propose FEND (Foundation model-based Evaluation of Neuropsychiatric Disorders), a comprehensive multi-modal framework integrating speech and text modalities for detecting AD, depression, and ASD across the lifespan.
arXiv Detail & Related papers (2025-12-24T05:07:07Z) - MedAlign: A Synergistic Framework of Multimodal Preference Optimization and Federated Meta-Cognitive Reasoning [52.064286116035134]
We develop MedAlign, a framework to ensure visually accurate LVLM responses for Medical Visual Question Answering (Med-VQA)<n>We first propose a multimodal Direct Preference Optimization (mDPO) objective to align preference learning with visual context.<n>We then design a Retrieval-Aware Mixture-of-Experts (RA-MoE) architecture that utilizes image and text similarity to route queries to a specialized and context-augmented LVLM.
arXiv Detail & Related papers (2025-10-24T02:11:05Z) - Psychiatry-Bench: A Multi-Task Benchmark for LLMs in Psychiatry [1.2879523047871226]
PsychiatryBench is a rigorously curated benchmark grounded exclusively in expert-validated psychiatric textbooks and casebooks.<n> PsychiatryBench comprises eleven distinct question-answering tasks ranging from diagnostic reasoning and treatment planning to longitudinal follow-up, management planning, clinical approach, sequential case analysis, and multiple-choice/extended matching formats totaling over 5,300 expert-annotated items.
arXiv Detail & Related papers (2025-09-07T20:57:24Z) - MLlm-DR: Towards Explainable Depression Recognition with MultiModal Large Language Models [28.873959594226605]
Automated depression diagnosis aims to analyze multimodal information from interview videos to predict participants' depression scores.<n>Previous studies often lack clear explanations of how these scores were determined, limiting their adoption in clinical practice.<n>We propose a novel multimodal large language model (MLlm-DR) that can understand multimodal information inputs and supports explainable depression diagnosis.
arXiv Detail & Related papers (2025-07-08T01:56:39Z) - MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM [58.2298313720146]
Multimodal hallucinations are multi-sourced and arise from diverse causes.<n>Existing benchmarks fail to adequately distinguish between perception-induced hallucinations and reasoning-induced hallucinations.
arXiv Detail & Related papers (2025-05-30T05:54:36Z) - Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making [80.94208848596215]
We present a new concept called Catfish Agent, a role-specialized LLM designed to inject structured dissent and counter silent agreement.<n>Inspired by the catfish effect'' in organizational psychology, the Catfish Agent is designed to challenge emerging consensus to stimulate deeper reasoning.
arXiv Detail & Related papers (2025-05-27T17:59:50Z) - Dual-domain Multi-path Self-supervised Diffusion Model for Accelerated MRI Reconstruction [9.601655294394313]
Recent advancements in deep learning, particularly diffusion models, have improved accelerated MRI reconstruction.
We propose Dual-domain Multi-path Self-supervised Diffusion Model (DMSM) to overcome these challenges.
Unlike traditional diffusion-based models, DMSM eliminates the dependency on training from fully sampled data.
arXiv Detail & Related papers (2025-03-24T16:10:51Z) - A-IDE : Agent-Integrated Denoising Experts [0.46040036610482665]
We introduce textbfAgent-Integrated Denoising Experts (A-IDE) framework, which integrates three anatomical region-specialized RED-CNN models.
A-IDE achieves superior performance in RMSE, PSNR, and SSIM compared to a single unified denoiser.
arXiv Detail & Related papers (2025-03-21T01:26:54Z) - Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions.
We propose a novel approach utilizing structured medical reasoning.
Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z) - Limitations of Large Language Models in Clinical Problem-Solving Arising from Inflexible Reasoning [3.3482359447109866]
Large Language Models (LLMs) have attained human-level accuracy on medical question-answer (QA) benchmarks.
Their limitations in navigating open-ended clinical scenarios have recently been shown.
We present the medical abstraction and reasoning corpus (M-ARC)
We find that LLMs, including current state-of-the-art o1 and Gemini models, perform poorly compared to physicians on M-ARC.
arXiv Detail & Related papers (2025-02-05T18:14:27Z) - MVICAD2: Multi-View Independent Component Analysis with Delays and Dilations [61.59658203704757]
We propose Multi-View Independent Component Analysis with Delays and Dilations (MVICAD2), which allows sources to differ across subjects in both temporal delays and dilations.<n>We present a model with identifiable sources, derive an approximation of its likelihood in closed form, and use regularization and optimization techniques to enhance performance.
arXiv Detail & Related papers (2025-01-13T15:47:02Z) - LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment.<n>We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews.<n>Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z) - HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation [89.3260120072177]
We propose a novel Historical-Constrained Large Language Models (HC-LLM) framework for Radiology report generation.<n>Our approach extracts both time-shared and time-specific features from longitudinal chest X-rays and diagnostic reports to capture disease progression.<n> Notably, our approach performs well even without historical data during testing and can be easily adapted to other multimodal large models.
arXiv Detail & Related papers (2024-12-15T06:04:16Z) - The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio [118.75449542080746]
This paper presents the first systematic investigation of hallucinations in large multimodal models (LMMs)
Our study reveals two key contributors to hallucinations: overreliance on unimodal priors and spurious inter-modality correlations.
Our findings highlight key vulnerabilities, including imbalances in modality integration and biases from training data, underscoring the need for balanced cross-modal learning.
arXiv Detail & Related papers (2024-10-16T17:59:02Z) - LoRKD: Low-Rank Knowledge Decomposition for Medical Foundation Models [59.961172635689664]
"Knowledge Decomposition" aims to improve the performance on specific medical tasks.
We propose a novel framework named Low-Rank Knowledge Decomposition (LoRKD)
LoRKD explicitly separates gradients from different tasks by incorporating low-rank expert modules and efficient knowledge separation convolution.
arXiv Detail & Related papers (2024-09-29T03:56:21Z) - RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question
Answering and Clinical Reasoning [14.366349078707263]
RJUA-MedDQA is a comprehensive benchmark in the field of medical specialization.
This work introduces RJUA-MedDQA, a comprehensive benchmark in the field of medical specialization.
arXiv Detail & Related papers (2024-02-19T06:57:02Z) - SF2Former: Amyotrophic Lateral Sclerosis Identification From
Multi-center MRI Data Using Spatial and Frequency Fusion Transformer [3.408266725482757]
Amyotrophic Lateral Sclerosis (ALS) is a complex neurodegenerative disorder involving motor neuron degeneration.
Deep learning has turned into a prominent class of machine learning programs in computer vision.
This study introduces a framework named SF2Former that leverages vision transformer architecture's power to distinguish the ALS subjects from the control group.
arXiv Detail & Related papers (2023-02-21T18:16:20Z) - Self-supervised multimodal neuroimaging yields predictive
representations for a spectrum of Alzheimer's phenotypes [27.331511924585023]
This work presents a novel multi-scale coordinated framework for learning multiple representations from multimodal neuroimaging data.
We propose a general taxonomy of informative inductive biases to capture unique and joint information in multimodal self-supervised fusion.
We show that self-supervised models reveal disorder-relevant brain regions and multimodal links without access to the labels during pre-training.
arXiv Detail & Related papers (2022-09-07T01:37:19Z) - MEDUSA: Multi-scale Encoder-Decoder Self-Attention Deep Neural Network
Architecture for Medical Image Analysis [71.2022403915147]
We introduce MEDUSA, a multi-scale encoder-decoder self-attention mechanism tailored for medical image analysis.
We obtain state-of-the-art performance on challenging medical image analysis benchmarks including COVIDx, RSNA RICORD, and RSNA Pneumonia Challenge.
arXiv Detail & Related papers (2021-10-12T15:05:15Z) - On self-supervised multi-modal representation learning: An application
to Alzheimer's disease [21.495288589801476]
Introspection of deep supervised predictive models trained on functional and structural brain imaging may uncover novel markers of Alzheimer's disease (AD)
Deep unsupervised and, recently, contrastive self-supervised approaches, not biased to classification, are better candidates for the task.
arXiv Detail & Related papers (2020-12-25T19:51:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.