MentalSeek-Dx: Towards Progressive Hypothetico-Deductive Reasoning for Real-world Psychiatric Diagnosis
- URL: http://arxiv.org/abs/2602.03340v1
- Date: Tue, 03 Feb 2026 10:03:35 GMT
- Title: MentalSeek-Dx: Towards Progressive Hypothetico-Deductive Reasoning for Real-world Psychiatric Diagnosis
- Authors: Xiao Sun, Yuming Yang, Junnan Zhu, Jiang Zhong, Xinyu Zhou, Kaiwen Wei,
- Abstract summary: MentalSeek-Dx Bench is the first benchmark dedicated to disorder-level psychiatric diagnosis within real-world clinical settings.<n>It comprises 712 de-identified electronic health records annotated by board-certified psychiatrists under ICD-11 guidelines.<n>MentalSeek-Dx achieves state-of-the-art (SOTA) performance with only 14B parameters, establishing a clinically grounded framework for reliable psychiatric diagnosis.
- Score: 27.839664095206857
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mental health disorders represent a burgeoning global public health challenge. While Large Language Models (LLMs) have demonstrated potential in psychiatric assessment, their clinical utility is severely constrained by benchmarks that lack ecological validity and fine-grained diagnostic supervision. To bridge this gap, we introduce \textbf{MentalDx Bench}, the first benchmark dedicated to disorder-level psychiatric diagnosis within real-world clinical settings. Comprising 712 de-identified electronic health records annotated by board-certified psychiatrists under ICD-11 guidelines, the benchmark covers 76 disorders across 16 diagnostic categories. Evaluation of 18 LLMs reveals a critical \textit{paradigm misalignment}: strong performance at coarse diagnostic categorization contrasts with systematic failure at disorder-level diagnosis, underscoring a gap between pattern-based modeling and clinical hypothetico-deductive reasoning. In response, we propose \textbf{MentalSeek-Dx}, a medical-specialized LLM trained to internalize this clinical reasoning process through supervised trajectory construction and curriculum-based reinforcement learning. Experiments on MentalDx Bench demonstrate that MentalSeek-Dx achieves state-of-the-art (SOTA) performance with only 14B parameters, establishing a clinically grounded framework for reliable psychiatric diagnosis.
Related papers
- MIND: Unified Inquiry and Diagnosis RL with Criteria Grounded Clinical Supports for Psychiatric Consultation [5.601620793903095]
We propose MIND, a unified inquiry--diagnosis reinforcement learning framework for psychiatric consultation.<n>Specifically, we build a Criteria-Grounded Psychiatric Reasoning Bank (PRB) that summarizes dialogue context into clinical retrieval states.<n>Building on this foundation, MIND enforces explicit clinical reasoning with rubric-based process rewards to provide fine-grained supervision over intermediate decision steps.
arXiv Detail & Related papers (2026-03-04T03:05:38Z) - MentalBench: A Benchmark for Evaluating Psychiatric Diagnostic Capability of Large Language Models [28.184599359142307]
MentalBench is a benchmark for evaluating psychiatric diagnostic decision-making in large language models (LLMs)<n>At the core of MentalBench is MentalKG, a psychiatrist-built and validated knowledge graph encoding DSM-5 diagnostic criteria and differential diagnostic rules for 23 psychiatric disorders.
arXiv Detail & Related papers (2026-02-13T12:21:33Z) - LingxiDiagBench: A Multi-Agent Framework for Benchmarking LLMs in Chinese Psychiatric Consultation and Diagnosis [14.82377002030236]
Mental disorders are highly prevalent worldwide.<n>The shortage of psychiatrists and the inherent subjectivity of interview-based diagnosis create substantial barriers to timely and consistent mental-health assessment.<n>We present LingxiDiagBench, a large-scale multi-agent benchmark.
arXiv Detail & Related papers (2026-02-10T03:46:05Z) - AI-Powered Early Diagnosis of Mental Health Disorders from Real-World Clinical Conversations [7.061237517845673]
Mental health disorders remain among the leading cause of disability worldwide.<n>Conditions such as depression, anxiety, and Post-Traumatic Stress Disorder (PTSD) are frequently underdiagnosed or misdiagnosed.<n>In primary care settings, studies show that providers misidentify depression or anxiety in over 60% of cases.
arXiv Detail & Related papers (2025-10-16T17:50:04Z) - Simulating Viva Voce Examinations to Evaluate Clinical Reasoning in Large Language Models [51.91760712805404]
We introduce VivaBench, a benchmark for evaluating sequential clinical reasoning in large language models (LLMs)<n>Our dataset consists of 1762 physician-curated clinical vignettes structured as interactive scenarios that simulate a (oral) examination in medical training.<n>Our analysis identified several failure modes that mirror common cognitive errors in clinical practice.
arXiv Detail & Related papers (2025-10-11T16:24:35Z) - Interpretable Neuropsychiatric Diagnosis via Concept-Guided Graph Neural Networks [56.75602443936853]
One in five adolescents currently live with a diagnosed mental or behavioral health condition, such as anxiety, depression, or conduct disorder.<n>While prior works use graph neural network (GNN) approaches for disorder prediction, they remain black-boxes, limiting their reliability and clinical translation.<n>In this work, we propose a concept-based diagnosis framework that that encodes interpretable functional connectivity concepts.<n>Our design ensures predictions through clinically meaningful connectivity patterns, enabling both interpretability and strong predictive performance.
arXiv Detail & Related papers (2025-10-02T19:38:46Z) - Unveiling the Landscape of Clinical Depression Assessment: From Behavioral Signatures to Psychiatric Reasoning [43.26860213892083]
Depression is a widespread mental disorder that affects millions worldwide.<n>Most studies rely on limited or non-clinically validated data, and often prioritize complex model design over real-world effectiveness.<n>We introduce C-MIND, a clinical neuropsychiatric multimodal diagnosis dataset collected over two years from real hospital visits.<n>Each participant completes three structured psychiatric tasks and receives a final diagnosis from expert clinicians, with informative audio, video, transcript, and functional near-infrared spectroscopy (fNIRS) signals recorded.
arXiv Detail & Related papers (2025-08-06T15:13:24Z) - MoodAngels: A Retrieval-augmented Multi-agent Framework for Psychiatry Diagnosis [58.67342568632529]
MoodAngels is the first specialized multi-agent framework for mood disorder diagnosis.<n>MoodSyn is an open-source dataset of 1,173 synthetic psychiatric cases.
arXiv Detail & Related papers (2025-06-04T09:18:25Z) - Beyond Empathy: Integrating Diagnostic and Therapeutic Reasoning with Large Language Models for Mental Health Counseling [50.83055329849865]
PsyLLM is a large language model designed to integrate diagnostic and therapeutic reasoning for mental health counseling.<n>It processes real-world mental health posts from Reddit and generates multi-turn dialogue structures.<n>Our experiments demonstrate that PsyLLM significantly outperforms state-of-the-art baseline models.
arXiv Detail & Related papers (2025-05-21T16:24:49Z) - MAGI: Multi-Agent Guided Interview for Psychiatric Assessment [50.6150986786028]
We present MAGI, the first framework that transforms the gold-standard Mini International Neuropsychiatric Interview (MINI) into automatic computational navigation.<n>We show that MAGI advances LLM- assisted mental health assessment by combining clinical rigor, conversational adaptability, and explainable reasoning.
arXiv Detail & Related papers (2025-04-25T11:08:27Z) - PsychBench: A comprehensive and professional benchmark for evaluating the performance of LLM-assisted psychiatric clinical practice [20.166682569070073]
Large Language Models (LLMs) offer potential solutions to address problems such as shortage of medical resources and low diagnostic consistency in psychiatric clinical practice.<n>We propose a benchmarking system, PsychBench, to evaluate the practical performance of LLMs in psychiatric clinical settings.<n>We show that while existing models demonstrate significant potential, they are not yet adequate as decision-making tools in psychiatric clinical practice.
arXiv Detail & Related papers (2025-02-28T12:17:41Z) - SemioLLM: Evaluating Large Language Models for Diagnostic Reasoning from Unstructured Clinical Narratives in Epilepsy [45.2233252981348]
Large Language Models (LLMs) have been shown to encode clinical knowledge.<n>We present SemioLLM, an evaluation framework that benchmarks 6 state-of-the-art models.<n>We show that most LLMs are able to accurately and confidently generate probabilistic predictions of seizure onset zones in the brain.
arXiv Detail & Related papers (2024-07-03T11:02:12Z) - Inheritance-guided Hierarchical Assignment for Clinical Automatic
Diagnosis [50.15205065710629]
Clinical diagnosis, which aims to assign diagnosis codes for a patient based on the clinical note, plays an essential role in clinical decision-making.
We propose a novel framework to combine the inheritance-guided hierarchical assignment and co-occurrence graph propagation for clinical automatic diagnosis.
arXiv Detail & Related papers (2021-01-27T13:16:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.