BenCao: An Instruction-Tuned Large Language Model for Traditional Chinese Medicine
- URL: http://arxiv.org/abs/2510.17415v1
- Date: Mon, 20 Oct 2025 10:57:37 GMT
- Title: BenCao: An Instruction-Tuned Large Language Model for Traditional Chinese Medicine
- Authors: Jiacheng Xie, Yang Yu, Yibo Chen, Hanyao Zhang, Lening Zhao, Jiaxuan He, Lei Jiang, Xiaoting Tang, Guanghui An, Dong Xu,
- Abstract summary: Traditional Chinese Medicine (TCM) plays a role in global healthcare.<n>Applying large language models (LLMs) to TCM remains challenging due to its reliance on holistic reasoning, implicit logic, and multimodal diagnostic cues.<n>We developed BenCao, a ChatGPT-based multimodal assistant for TCM, integrating structured knowledge bases, diagnostic data, and expert feedback refinement.<n>BenCao was trained through natural language instruction tuning rather than parameter retraining, aligning with expert-level reasoning and ethical norms specific to TCM.
- Score: 11.485720230834922
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional Chinese Medicine (TCM), with a history spanning over two millennia, plays a role in global healthcare. However, applying large language models (LLMs) to TCM remains challenging due to its reliance on holistic reasoning, implicit logic, and multimodal diagnostic cues. Existing TCM-domain LLMs have made progress in text-based understanding but lack multimodal integration, interpretability, and clinical applicability. To address these limitations, we developed BenCao, a ChatGPT-based multimodal assistant for TCM, integrating structured knowledge bases, diagnostic data, and expert feedback refinement. BenCao was trained through natural language instruction tuning rather than parameter retraining, aligning with expert-level reasoning and ethical norms specific to TCM. The system incorporates a comprehensive knowledge base of over 1,000 classical and modern texts, a scenario-based instruction framework for diverse interactions, a chain-of-thought simulation mechanism for interpretable reasoning, and a feedback refinement process involving licensed TCM practitioners. BenCao connects to external APIs for tongue-image classification and multimodal database retrieval, enabling dynamic access to diagnostic resources. In evaluations across single-choice question benchmarks and multimodal classification tasks, BenCao achieved superior accuracy to general-domain and TCM-domain models, particularly in diagnostics, herb recognition, and constitution classification. The model was deployed as an interactive application on the OpenAI GPTs Store, accessed by nearly 1,000 users globally as of October 2025. This study demonstrates the feasibility of developing a TCM-domain LLM through natural language-based instruction tuning and multimodal integration, offering a practical framework for aligning generative AI with traditional medical reasoning and a scalable pathway for real-world deployment.
Related papers
- MMedExpert-R1: Strengthening Multimodal Medical Reasoning via Domain-Specific Adaptation and Clinical Guideline Reinforcement [63.82954136824963]
Medical Vision-Language Models excel at perception tasks with complex clinical reasoning required in real-world scenarios.<n>We propose a novel reasoning MedVLM that addresses these challenges through domain-specific adaptation and guideline reinforcement.
arXiv Detail & Related papers (2026-01-16T02:32:07Z) - TCM-Eval: An Expert-Level Dynamic and Extensible Benchmark for Traditional Chinese Medicine [51.01817637808011]
We introduce TCM-Eval, the first dynamic and high-quality benchmark for Traditional Chinese Medicine (TCM)<n>We construct a large-scale training corpus and propose Self-Iterative Chain-of-Thought Enhancement (SI-CoTE)<n>Using this enriched training data, we develop ZhiMingTang (ZMT), a state-of-the-art LLM specifically designed for TCM.
arXiv Detail & Related papers (2025-11-10T14:35:25Z) - RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis [56.373297358647655]
Retrieval-Augmented Diagnosis (RAD) is a novel framework that injects external knowledge into multimodal models directly on downstream tasks.<n>RAD operates through three key mechanisms: retrieval and refinement of disease-centered knowledge from multiple medical sources, a guideline-enhanced contrastive loss transformer, and a dual decoder.
arXiv Detail & Related papers (2025-09-24T10:36:14Z) - A Knowledge-driven Adaptive Collaboration of LLMs for Enhancing Medical Decision-making [49.048767633316764]
KAMAC is a knowledge-driven Adaptive Multi-Agent Collaboration framework.<n>It enables agents to dynamically form and expand expert teams based on the evolving diagnostic context.<n> Experiments on two real-world medical benchmarks demonstrate that KAMAC significantly outperforms both single-agent and advanced multi-agent methods.
arXiv Detail & Related papers (2025-09-18T14:33:36Z) - ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine [53.91744478760689]
We present ShizhenGPT, the first multimodal language model tailored for Traditional Chinese Medicine (TCM)<n>ShizhenGPT is pretrained and instruction-tuned to achieve deep TCM knowledge and multimodal reasoning.<n>Experiments demonstrate that ShizhenGPT outperforms comparable-scale LLMs and competes with larger proprietary models.
arXiv Detail & Related papers (2025-08-20T13:30:20Z) - MTCMB: A Multi-Task Benchmark Framework for Evaluating LLMs on Knowledge, Reasoning, and Safety in Traditional Chinese Medicine [36.08458917280579]
MTCMB comprises 12 sub-datasets spanning five major categories: knowledge QA, language understanding, diagnostic reasoning, prescription generation, and safety evaluation.<n>Preliminary results indicate that current LLMs perform well on foundational knowledge but fall short in clinical reasoning, prescription planning, and safety compliance.
arXiv Detail & Related papers (2025-06-02T02:01:40Z) - TCM-Ladder: A Benchmark for Multimodal Question Answering on Traditional Chinese Medicine [21.46828174190836]
We introduce TCM-Ladder, the first multimodal QA dataset specifically designed for evaluating large TCM language models.<n>The dataset spans multiple core disciplines of TCM, including fundamental theory, diagnostics, herbal formulas, internal medicine, surgery, pharmacognosy, and pediatrics.<n>The datasets were constructed using a combination of automated and manual filtering processes and comprise 52,000+ questions in total.
arXiv Detail & Related papers (2025-05-29T23:13:57Z) - Tianyi: A Traditional Chinese Medicine all-rounder language model and its Real-World Clinical Practice [15.020917068333237]
Tianyi is designed to assimilate interconnected and systematic TCM knowledge through a progressive learning manner.<n>Extensive evaluations demonstrate the significant potential of Tianyi as an AI assistant in TCM clinical practice and research.
arXiv Detail & Related papers (2025-05-19T14:17:37Z) - PyTDC: A multimodal machine learning training, evaluation, and inference platform for biomedical foundation models [59.17570021208177]
PyTDC is a machine-learning platform providing streamlined training, evaluation, and inference software for multimodal biological AI models.<n>This paper discusses the components of PyTDC's architecture and, to our knowledge, the first-of-its-kind case study on the introduced single-cell drug-target nomination ML task.
arXiv Detail & Related papers (2025-05-08T18:15:38Z) - OpenTCM: A GraphRAG-Empowered LLM-based System for Traditional Chinese Medicine Knowledge Retrieval and Diagnosis [2.639291045535649]
OpenTCM is a domain-specific knowledge graph and Graph-based Retrieval-Augmented Generation system.<n>We extract more than 3.73 million classical Chinese characters from 68 gynecological books in the Chinese Medical Classics Database.<n>OpenTCM achieves mean expert scores (MES) of 4.378 in ingredient information retrieval and 4.045 in diagnostic question-answering tasks.
arXiv Detail & Related papers (2025-04-28T08:04:44Z) - TCM-3CEval: A Triaxial Benchmark for Assessing Responses from Large Language Models in Traditional Chinese Medicine [10.74071774496229]
Large language models (LLMs) excel in various NLP tasks and modern medicine, but their evaluation in traditional Chinese medicine (TCM) is underexplored.<n>To address this, we introduce TCM3CEval, a benchmark assessing LLMs in TCM across three dimensions: core knowledge mastery, classical text understanding, and clinical decision-making.<n>Results show a performance hierarchy: all models have limitations in specialized like Meridian & Acupoint theory and Various TCM Schools, revealing gaps between current capabilities and clinical needs.
arXiv Detail & Related papers (2025-03-10T08:29:15Z) - PMC-LLaMA: Towards Building Open-source Language Models for Medicine [62.39105735933138]
Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding.
LLMs struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge.
We describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.
arXiv Detail & Related papers (2023-04-27T18:29:05Z) - UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual
Embeddings Using the Unified Medical Language System Metathesaurus [73.86656026386038]
We introduce UmlsBERT, a contextual embedding model that integrates domain knowledge during the pre-training process.
By applying these two strategies, UmlsBERT can encode clinical domain knowledge into word embeddings and outperform existing domain-specific models.
arXiv Detail & Related papers (2020-10-20T15:56:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.