Related papers: TCM-Eval: An Expert-Level Dynamic and Extensible Benchmark for Traditional Chinese Medicine

TCM-Eval: An Expert-Level Dynamic and Extensible Benchmark for Traditional Chinese Medicine

URL: http://arxiv.org/abs/2511.07148v1
Date: Mon, 10 Nov 2025 14:35:25 GMT
Title: TCM-Eval: An Expert-Level Dynamic and Extensible Benchmark for Traditional Chinese Medicine
Authors: Zihao Cheng, Yuheng Lu, Huaiqian Ye, Zeming Liu, Minqi Wang, Jingjing Liu, Zihan Li, Wei Fan, Yuanfang Guo, Ruiji Fu, Shifeng She, Gang Wang, Yunhong Wang,
Abstract summary: We introduce TCM-Eval, the first dynamic and high-quality benchmark for Traditional Chinese Medicine (TCM)<n>We construct a large-scale training corpus and propose Self-Iterative Chain-of-Thought Enhancement (SI-CoTE)<n>Using this enriched training data, we develop ZhiMingTang (ZMT), a state-of-the-art LLM specifically designed for TCM.
Score: 51.01817637808011
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in modern medicine, yet their application in Traditional Chinese Medicine (TCM) remains severely limited by the absence of standardized benchmarks and the scarcity of high-quality training data. To address these challenges, we introduce TCM-Eval, the first dynamic and extensible benchmark for TCM, meticulously curated from national medical licensing examinations and validated by TCM experts. Furthermore, we construct a large-scale training corpus and propose Self-Iterative Chain-of-Thought Enhancement (SI-CoTE) to autonomously enrich question-answer pairs with validated reasoning chains through rejection sampling, establishing a virtuous cycle of data and model co-evolution. Using this enriched training data, we develop ZhiMingTang (ZMT), a state-of-the-art LLM specifically designed for TCM, which significantly exceeds the passing threshold for human practitioners. To encourage future research and development, we release a public leaderboard, fostering community engagement and continuous improvement.

Related papers

TCM-5CEval: Extended Deep Evaluation Benchmark for LLM's Comprehensive Clinical Research Competence in Traditional Chinese Medicine [11.944521938566231]
Large language models (LLMs) have demonstrated exceptional capabilities in general domains, yet their application in highly specialized and culturally-rich fields like Traditional Chinese Medicine (TCM) requires rigorous evaluation.<n>TCM-5CEval is designed to assess LLMs across five critical dimensions: (1) Core Knowledge (TCM-seek), (2) Classical Literacy (TCM-LitQA), (3) Clinical Decision-making (TCM-MRCD), (4) Chinese Materia Medica (TCM-CMM), and (5) Clinical Non-pharmacological Therapy (TCM-ClinNPT)
arXiv Detail & Related papers (2025-11-17T09:15:41Z)
MedAlign: A Synergistic Framework of Multimodal Preference Optimization and Federated Meta-Cognitive Reasoning [52.064286116035134]
We develop MedAlign, a framework to ensure visually accurate LVLM responses for Medical Visual Question Answering (Med-VQA)<n>We first propose a multimodal Direct Preference Optimization (mDPO) objective to align preference learning with visual context.<n>We then design a Retrieval-Aware Mixture-of-Experts (RA-MoE) architecture that utilizes image and text similarity to route queries to a specialized and context-augmented LVLM.
arXiv Detail & Related papers (2025-10-24T02:11:05Z)
Leveraging Group Relative Policy Optimization to Advance Large Language Models in Traditional Chinese Medicine [9.74563376905193]
We introduce Ladder-base, the first TCM-focused large language model trained with Group Relative Policy Optimization.<n>Ladder-base is built upon the Qwen2.5-7B-Instruct foundation model and trained exclusively on the textual subset of the TCM-Ladder benchmark.
arXiv Detail & Related papers (2025-10-20T10:43:33Z)
ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine [53.91744478760689]
We present ShizhenGPT, the first multimodal language model tailored for Traditional Chinese Medicine (TCM)<n>ShizhenGPT is pretrained and instruction-tuned to achieve deep TCM knowledge and multimodal reasoning.<n>Experiments demonstrate that ShizhenGPT outperforms comparable-scale LLMs and competes with larger proprietary models.
arXiv Detail & Related papers (2025-08-20T13:30:20Z)
MedKGent: A Large Language Model Agent Framework for Constructing Temporally Evolving Medical Knowledge Graph [57.54231831309079]
We introduce MedKGent, a framework for constructing temporally evolving medical Knowledge Graphs.<n>We simulate the emergence of biomedical knowledge via a fine-grained daily time series.<n>The resulting KG contains 156,275 entities and 2,971,384 relational triples.
arXiv Detail & Related papers (2025-08-17T15:14:03Z)
Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications [59.721265428780946]
Large Language Models (LLMs) in medicine have enabled impressive capabilities, yet a critical gap remains in their ability to perform systematic, transparent, and verifiable reasoning.<n>This paper provides the first systematic review of this emerging field.<n>We propose a taxonomy of reasoning enhancement techniques, categorized into training-time strategies and test-time mechanisms.
arXiv Detail & Related papers (2025-08-01T14:41:31Z)
Tianyi: A Traditional Chinese Medicine all-rounder language model and its Real-World Clinical Practice [15.020917068333237]
Tianyi is designed to assimilate interconnected and systematic TCM knowledge through a progressive learning manner.<n>Extensive evaluations demonstrate the significant potential of Tianyi as an AI assistant in TCM clinical practice and research.
arXiv Detail & Related papers (2025-05-19T14:17:37Z)
BianCang: A Traditional Chinese Medicine Large Language Model [33.738284400742124]
We propose BianCang, a TCM-specific large language model (LLMs) that first injects domain-specific knowledge and then aligns it through targeted stimulation to enhance diagnostic and differentiation capabilities.<n>We constructed pre-training corpora, instruction-aligned datasets based on real hospital records, and the ChP-TCM dataset derived from the Pharmacopoeia of the People's Republic of China.<n>We compiled extensive TCM and medical corpora for continual pre-training and supervised fine-tuning, building a comprehensive dataset to refine the model's understanding of TCM.
arXiv Detail & Related papers (2024-11-17T10:17:01Z)
Competence-based Multimodal Curriculum Learning for Medical Report Generation [98.10763792453925]
We propose a Competence-based Multimodal Curriculum Learning framework ( CMCL) to alleviate the data bias and make best use of available data. Specifically, CMCL simulates the learning process of radiologists and optimize the model in a step by step manner. Experiments on the public IU-Xray and MIMIC-CXR datasets show that CMCL can be incorporated into existing models to improve their performance.
arXiv Detail & Related papers (2022-06-24T08:16:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.