Related papers: MentalSeek-Dx: Towards Progressive Hypothetico-Deductive Reasoning for Real-world Psychiatric Diagnosis

MentalSeek-Dx: Towards Progressive Hypothetico-Deductive Reasoning for Real-world Psychiatric Diagnosis

URL: http://arxiv.org/abs/2602.03340v1
Date: Tue, 03 Feb 2026 10:03:35 GMT
Title: MentalSeek-Dx: Towards Progressive Hypothetico-Deductive Reasoning for Real-world Psychiatric Diagnosis
Authors: Xiao Sun, Yuming Yang, Junnan Zhu, Jiang Zhong, Xinyu Zhou, Kaiwen Wei,
Abstract summary: MentalSeek-Dx Bench is the first benchmark dedicated to disorder-level psychiatric diagnosis within real-world clinical settings.<n>It comprises 712 de-identified electronic health records annotated by board-certified psychiatrists under ICD-11 guidelines.<n>MentalSeek-Dx achieves state-of-the-art (SOTA) performance with only 14B parameters, establishing a clinically grounded framework for reliable psychiatric diagnosis.
Score: 27.839664095206857
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mental health disorders represent a burgeoning global public health challenge. While Large Language Models (LLMs) have demonstrated potential in psychiatric assessment, their clinical utility is severely constrained by benchmarks that lack ecological validity and fine-grained diagnostic supervision. To bridge this gap, we introduce \textbf{MentalDx Bench}, the first benchmark dedicated to disorder-level psychiatric diagnosis within real-world clinical settings. Comprising 712 de-identified electronic health records annotated by board-certified psychiatrists under ICD-11 guidelines, the benchmark covers 76 disorders across 16 diagnostic categories. Evaluation of 18 LLMs reveals a critical \textit{paradigm misalignment}: strong performance at coarse diagnostic categorization contrasts with systematic failure at disorder-level diagnosis, underscoring a gap between pattern-based modeling and clinical hypothetico-deductive reasoning. In response, we propose \textbf{MentalSeek-Dx}, a medical-specialized LLM trained to internalize this clinical reasoning process through supervised trajectory construction and curriculum-based reinforcement learning. Experiments on MentalDx Bench demonstrate that MentalSeek-Dx achieves state-of-the-art (SOTA) performance with only 14B parameters, establishing a clinically grounded framework for reliable psychiatric diagnosis.

Related papers

MIND: Unified Inquiry and Diagnosis RL with Criteria Grounded Clinical Supports for Psychiatric Consultation [5.601620793903095]
We propose MIND, a unified inquiry--diagnosis reinforcement learning framework for psychiatric consultation.<n>Specifically, we build a Criteria-Grounded Psychiatric Reasoning Bank (PRB) that summarizes dialogue context into clinical retrieval states.<n>Building on this foundation, MIND enforces explicit clinical reasoning with rubric-based process rewards to provide fine-grained supervision over intermediate decision steps.
arXiv Detail & Related papers (2026-03-04T03:05:38Z)
MentalBench: A Benchmark for Evaluating Psychiatric Diagnostic Capability of Large Language Models [28.184599359142307]
MentalBench is a benchmark for evaluating psychiatric diagnostic decision-making in large language models (LLMs)<n>At the core of MentalBench is MentalKG, a psychiatrist-built and validated knowledge graph encoding DSM-5 diagnostic criteria and differential diagnostic rules for 23 psychiatric disorders.
arXiv Detail & Related papers (2026-02-13T12:21:33Z)
LingxiDiagBench: A Multi-Agent Framework for Benchmarking LLMs in Chinese Psychiatric Consultation and Diagnosis [14.82377002030236]
Mental disorders are highly prevalent worldwide.<n>The shortage of psychiatrists and the inherent subjectivity of interview-based diagnosis create substantial barriers to timely and consistent mental-health assessment.<n>We present LingxiDiagBench, a large-scale multi-agent benchmark.
arXiv Detail & Related papers (2026-02-10T03:46:05Z)
AI-Powered Early Diagnosis of Mental Health Disorders from Real-World Clinical Conversations [7.061237517845673]
Mental health disorders remain among the leading cause of disability worldwide.<n>Conditions such as depression, anxiety, and Post-Traumatic Stress Disorder (PTSD) are frequently underdiagnosed or misdiagnosed.<n>In primary care settings, studies show that providers misidentify depression or anxiety in over 60% of cases.
arXiv Detail & Related papers (2025-10-16T17:50:04Z)
Simulating Viva Voce Examinations to Evaluate Clinical Reasoning in Large Language Models [51.91760712805404]
We introduce VivaBench, a benchmark for evaluating sequential clinical reasoning in large language models (LLMs)<n>Our dataset consists of 1762 physician-curated clinical vignettes structured as interactive scenarios that simulate a (oral) examination in medical training.<n>Our analysis identified several failure modes that mirror common cognitive errors in clinical practice.
arXiv Detail & Related papers (2025-10-11T16:24:35Z)
Interpretable Neuropsychiatric Diagnosis via Concept-Guided Graph Neural Networks [56.75602443936853]
One in five adolescents currently live with a diagnosed mental or behavioral health condition, such as anxiety, depression, or conduct disorder.<n>While prior works use graph neural network (GNN) approaches for disorder prediction, they remain black-boxes, limiting their reliability and clinical translation.<n>In this work, we propose a concept-based diagnosis framework that that encodes interpretable functional connectivity concepts.<n>Our design ensures predictions through clinically meaningful connectivity patterns, enabling both interpretability and strong predictive performance.
arXiv Detail & Related papers (2025-10-02T19:38:46Z)
Unveiling the Landscape of Clinical Depression Assessment: From Behavioral Signatures to Psychiatric Reasoning [43.26860213892083]
Depression is a widespread mental disorder that affects millions worldwide.<n>Most studies rely on limited or non-clinically validated data, and often prioritize complex model design over real-world effectiveness.<n>We introduce C-MIND, a clinical neuropsychiatric multimodal diagnosis dataset collected over two years from real hospital visits.<n>Each participant completes three structured psychiatric tasks and receives a final diagnosis from expert clinicians, with informative audio, video, transcript, and functional near-infrared spectroscopy (fNIRS) signals recorded.
arXiv Detail & Related papers (2025-08-06T15:13:24Z)
MoodAngels: A Retrieval-augmented Multi-agent Framework for Psychiatry Diagnosis [58.67342568632529]
MoodAngels is the first specialized multi-agent framework for mood disorder diagnosis.<n>MoodSyn is an open-source dataset of 1,173 synthetic psychiatric cases.
arXiv Detail & Related papers (2025-06-04T09:18:25Z)
Beyond Empathy: Integrating Diagnostic and Therapeutic Reasoning with Large Language Models for Mental Health Counseling [50.83055329849865]
PsyLLM is a large language model designed to integrate diagnostic and therapeutic reasoning for mental health counseling.<n>It processes real-world mental health posts from Reddit and generates multi-turn dialogue structures.<n>Our experiments demonstrate that PsyLLM significantly outperforms state-of-the-art baseline models.
arXiv Detail & Related papers (2025-05-21T16:24:49Z)
MAGI: Multi-Agent Guided Interview for Psychiatric Assessment [50.6150986786028]
We present MAGI, the first framework that transforms the gold-standard Mini International Neuropsychiatric Interview (MINI) into automatic computational navigation.<n>We show that MAGI advances LLM- assisted mental health assessment by combining clinical rigor, conversational adaptability, and explainable reasoning.
arXiv Detail & Related papers (2025-04-25T11:08:27Z)
PsychBench: A comprehensive and professional benchmark for evaluating the performance of LLM-assisted psychiatric clinical practice [20.166682569070073]
Large Language Models (LLMs) offer potential solutions to address problems such as shortage of medical resources and low diagnostic consistency in psychiatric clinical practice.<n>We propose a benchmarking system, PsychBench, to evaluate the practical performance of LLMs in psychiatric clinical settings.<n>We show that while existing models demonstrate significant potential, they are not yet adequate as decision-making tools in psychiatric clinical practice.
arXiv Detail & Related papers (2025-02-28T12:17:41Z)
SemioLLM: Evaluating Large Language Models for Diagnostic Reasoning from Unstructured Clinical Narratives in Epilepsy [45.2233252981348]
Large Language Models (LLMs) have been shown to encode clinical knowledge.<n>We present SemioLLM, an evaluation framework that benchmarks 6 state-of-the-art models.<n>We show that most LLMs are able to accurately and confidently generate probabilistic predictions of seizure onset zones in the brain.
arXiv Detail & Related papers (2024-07-03T11:02:12Z)
Inheritance-guided Hierarchical Assignment for Clinical Automatic Diagnosis [50.15205065710629]
Clinical diagnosis, which aims to assign diagnosis codes for a patient based on the clinical note, plays an essential role in clinical decision-making. We propose a novel framework to combine the inheritance-guided hierarchical assignment and co-occurrence graph propagation for clinical automatic diagnosis.
arXiv Detail & Related papers (2021-01-27T13:16:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.