Related papers: ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine

ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine

URL: http://arxiv.org/abs/2508.14706v1
Date: Wed, 20 Aug 2025 13:30:20 GMT
Title: ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine
Authors: Junying Chen, Zhenyang Cai, Zhiheng Liu, Yunjin Yang, Rongsheng Wang, Qingying Xiao, Xiangyi Feng, Zhan Su, Jing Guo, Xiang Wan, Guangjun Yu, Haizhou Li, Benyou Wang,
Abstract summary: We present ShizhenGPT, the first multimodal language model tailored for Traditional Chinese Medicine (TCM)<n>ShizhenGPT is pretrained and instruction-tuned to achieve deep TCM knowledge and multimodal reasoning.<n>Experiments demonstrate that ShizhenGPT outperforms comparable-scale LLMs and competes with larger proprietary models.
Score: 53.91744478760689
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite the success of large language models (LLMs) in various domains, their potential in Traditional Chinese Medicine (TCM) remains largely underexplored due to two critical barriers: (1) the scarcity of high-quality TCM data and (2) the inherently multimodal nature of TCM diagnostics, which involve looking, listening, smelling, and pulse-taking. These sensory-rich modalities are beyond the scope of conventional LLMs. To address these challenges, we present ShizhenGPT, the first multimodal LLM tailored for TCM. To overcome data scarcity, we curate the largest TCM dataset to date, comprising 100GB+ of text and 200GB+ of multimodal data, including 1.2M images, 200 hours of audio, and physiological signals. ShizhenGPT is pretrained and instruction-tuned to achieve deep TCM knowledge and multimodal reasoning. For evaluation, we collect recent national TCM qualification exams and build a visual benchmark for Medicinal Recognition and Visual Diagnosis. Experiments demonstrate that ShizhenGPT outperforms comparable-scale LLMs and competes with larger proprietary models. Moreover, it leads in TCM visual understanding among existing multimodal LLMs and demonstrates unified perception across modalities like sound, pulse, smell, and vision, paving the way toward holistic multimodal perception and diagnosis in TCM. Datasets, models, and code are publicly available. We hope this work will inspire further exploration in this field.

Related papers

MMedExpert-R1: Strengthening Multimodal Medical Reasoning via Domain-Specific Adaptation and Clinical Guideline Reinforcement [63.82954136824963]
Medical Vision-Language Models excel at perception tasks with complex clinical reasoning required in real-world scenarios.<n>We propose a novel reasoning MedVLM that addresses these challenges through domain-specific adaptation and guideline reinforcement.
arXiv Detail & Related papers (2026-01-16T02:32:07Z)
TCM-5CEval: Extended Deep Evaluation Benchmark for LLM's Comprehensive Clinical Research Competence in Traditional Chinese Medicine [11.944521938566231]
Large language models (LLMs) have demonstrated exceptional capabilities in general domains, yet their application in highly specialized and culturally-rich fields like Traditional Chinese Medicine (TCM) requires rigorous evaluation.<n>TCM-5CEval is designed to assess LLMs across five critical dimensions: (1) Core Knowledge (TCM-seek), (2) Classical Literacy (TCM-LitQA), (3) Clinical Decision-making (TCM-MRCD), (4) Chinese Materia Medica (TCM-CMM), and (5) Clinical Non-pharmacological Therapy (TCM-ClinNPT)
arXiv Detail & Related papers (2025-11-17T09:15:41Z)
TCM-Eval: An Expert-Level Dynamic and Extensible Benchmark for Traditional Chinese Medicine [51.01817637808011]
We introduce TCM-Eval, the first dynamic and high-quality benchmark for Traditional Chinese Medicine (TCM)<n>We construct a large-scale training corpus and propose Self-Iterative Chain-of-Thought Enhancement (SI-CoTE)<n>Using this enriched training data, we develop ZhiMingTang (ZMT), a state-of-the-art LLM specifically designed for TCM.
arXiv Detail & Related papers (2025-11-10T14:35:25Z)
MAM: Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration [57.98393950821579]
We introduce the Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis (MAM)<n>Inspired by our empirical findings, MAM decomposes the medical diagnostic process into specialized roles: a General Practitioner, Specialist Team, Radiologist, Medical Assistant, and Director.<n>This modular and collaborative framework enables efficient knowledge updates and leverages existing medical LLMs and knowledge bases.
arXiv Detail & Related papers (2025-06-24T17:52:43Z)
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning [57.873833577058]
We build a multimodal dataset enriched with extensive medical knowledge.<n>We then introduce our medical-specialized MLLM: Lingshu.<n>Lingshu undergoes multi-stage training to embed medical expertise and enhance its task-solving capabilities.
arXiv Detail & Related papers (2025-06-08T08:47:30Z)
Tianyi: A Traditional Chinese Medicine all-rounder language model and its Real-World Clinical Practice [15.020917068333237]
Tianyi is designed to assimilate interconnected and systematic TCM knowledge through a progressive learning manner.<n>Extensive evaluations demonstrate the significant potential of Tianyi as an AI assistant in TCM clinical practice and research.
arXiv Detail & Related papers (2025-05-19T14:17:37Z)
OpenTCM: A GraphRAG-Empowered LLM-based System for Traditional Chinese Medicine Knowledge Retrieval and Diagnosis [2.639291045535649]
OpenTCM is a domain-specific knowledge graph and Graph-based Retrieval-Augmented Generation system.<n>We extract more than 3.73 million classical Chinese characters from 68 gynecological books in the Chinese Medical Classics Database.<n>OpenTCM achieves mean expert scores (MES) of 4.378 in ingredient information retrieval and 4.045 in diagnostic question-answering tasks.
arXiv Detail & Related papers (2025-04-28T08:04:44Z)
TCM-3CEval: A Triaxial Benchmark for Assessing Responses from Large Language Models in Traditional Chinese Medicine [10.74071774496229]
Large language models (LLMs) excel in various NLP tasks and modern medicine, but their evaluation in traditional Chinese medicine (TCM) is underexplored.<n>To address this, we introduce TCM3CEval, a benchmark assessing LLMs in TCM across three dimensions: core knowledge mastery, classical text understanding, and clinical decision-making.<n>Results show a performance hierarchy: all models have limitations in specialized like Meridian & Acupoint theory and Various TCM Schools, revealing gaps between current capabilities and clinical needs.
arXiv Detail & Related papers (2025-03-10T08:29:15Z)
BianCang: A Traditional Chinese Medicine Large Language Model [22.582027277167047]
BianCang is a TCM-specific large language model (LLMs) that first injects domain-specific knowledge and then aligns it through targeted stimulation. We constructed pre-training corpora, instruction-aligned datasets based on real hospital records, and the ChP-TCM dataset derived from the Pharmacopoeia of the People's Republic of China. We compiled extensive TCM and medical corpora for continuous pre-training and supervised fine-tuning, building a comprehensive dataset to refine the model's understanding of TCM.
arXiv Detail & Related papers (2024-11-17T10:17:01Z)
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio [118.75449542080746]
This paper presents the first systematic investigation of hallucinations in large multimodal models (LMMs) Our study reveals two key contributors to hallucinations: overreliance on unimodal priors and spurious inter-modality correlations. Our findings highlight key vulnerabilities, including imbalances in modality integration and biases from training data, underscoring the need for balanced cross-modal learning.
arXiv Detail & Related papers (2024-10-16T17:59:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.