Do "New Snow Tablets" Contain Snow? Large Language Models Over-Rely on Names to Identify Ingredients of Chinese Drugs
- URL: http://arxiv.org/abs/2504.03786v3
- Date: Tue, 15 Apr 2025 08:51:02 GMT
- Title: Do "New Snow Tablets" Contain Snow? Large Language Models Over-Rely on Names to Identify Ingredients of Chinese Drugs
- Authors: Sifan Li, Yujun Cai, Bryan Hooi, Nanyun Peng, Yiwei Wang,
- Abstract summary: Traditional Chinese Medicine (TCM) has seen increasing adoption in healthcare, with specialized Large Language Models (LLMs) emerging to support clinical applications.<n>A fundamental requirement for these models is accurate identification of TCM drug ingredients.<n>Our systematic analysis reveals consistent failure patterns: models often interpret drug names literally, overuse common herbs regardless of relevance, and exhibit erratic behaviors when faced with unfamiliar formulations.
- Score: 79.00288739947406
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional Chinese Medicine (TCM) has seen increasing adoption in healthcare, with specialized Large Language Models (LLMs) emerging to support clinical applications. A fundamental requirement for these models is accurate identification of TCM drug ingredients. In this paper, we evaluate how general and TCM-specialized LLMs perform when identifying ingredients of Chinese drugs. Our systematic analysis reveals consistent failure patterns: models often interpret drug names literally, overuse common herbs regardless of relevance, and exhibit erratic behaviors when faced with unfamiliar formulations. LLMs also fail to understand the verification task. These findings demonstrate that current LLMs rely primarily on drug names rather than possessing systematic pharmacological knowledge. To address these limitations, we propose a Retrieval Augmented Generation (RAG) approach focused on ingredient names. Experiments across 220 TCM formulations show our method significantly improves accuracy from approximately 50% to 82% in ingredient verification tasks. Our work highlights critical weaknesses in current TCM-specific LLMs and offers a practical solution for enhancing their clinical reliability.
Related papers
- Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions.<n>We propose a novel approach utilizing structured medical reasoning.<n>Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z) - Fact or Guesswork? Evaluating Large Language Model's Medical Knowledge with Structured One-Hop Judgment [108.55277188617035]
Large language models (LLMs) have been widely adopted in various downstream task domains, but their ability to directly recall and apply factual medical knowledge remains under-explored.<n>Most existing medical QA benchmarks assess complex reasoning or multi-hop inference, making it difficult to isolate LLMs' inherent medical knowledge from their reasoning capabilities.<n>We introduce the Medical Knowledge Judgment, a dataset specifically designed to measure LLMs' one-hop factual medical knowledge.
arXiv Detail & Related papers (2025-02-20T05:27:51Z) - Reliable and diverse evaluation of LLM medical knowledge mastery [6.825565574784612]
We propose a novel framework that generates reliable and diverse test samples to evaluate medical-specific LLMs.
We use our proposed framework to systematically investigate the mastery of medical factual knowledge of 12 well-known LLMs.
arXiv Detail & Related papers (2024-09-22T03:13:38Z) - Rx Strategist: Prescription Verification using LLM Agents System [0.0]
Rx Strategist uses knowledge graphs and different search strategies to enhance the power of Large Language Models (LLMs) inside an agentic framework.
This multifaceted technique allows for a multi-stage LLM pipeline and reliable information retrieval from a custom-built active ingredient database.
Our findings demonstrate that Rx Strategist surpasses many current LLMs, achieving performance comparable to that of a highly experienced clinical pharmacist.
arXiv Detail & Related papers (2024-09-05T11:42:26Z) - MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance [17.008132675107355]
This paper focuses on the problem of Pharmacovigilance (PhV), where the significance and challenges lie in identifying Adverse Drug Events (ADEs) from diverse text sources.
We present MALADE, the first effective collaborative multi-agent system powered by Large Language Models with Retrieval Augmented Generation for ADE extraction from drug label data.
arXiv Detail & Related papers (2024-08-03T22:14:13Z) - Emerging Opportunities of Using Large Language Models for Translation
Between Drug Molecules and Indications [6.832024637226738]
We propose a new task, which is the translation between drug molecules and corresponding indications.
The creation of molecules from indications, or vice versa, will allow for more efficient targeting of diseases.
arXiv Detail & Related papers (2024-02-14T21:33:13Z) - Large Language Model Distilling Medication Recommendation Model [58.94186280631342]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs)<n>Our research aims to transform existing medication recommendation methodologies using LLMs.<n>To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z) - A Survey of Large Language Models in Medicine: Progress, Application, and Challenge [85.09998659355038]
Large language models (LLMs) have received substantial attention due to their capabilities for understanding and generating human language.
This review aims to provide a detailed overview of the development and deployment of LLMs in medicine.
arXiv Detail & Related papers (2023-11-09T02:55:58Z) - Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning.
They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health.
Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.