Medical Knowledge Intervention Prompt Tuning for Medical Image Classification
- URL: http://arxiv.org/abs/2511.12639v1
- Date: Sun, 16 Nov 2025 15:09:46 GMT
- Title: Medical Knowledge Intervention Prompt Tuning for Medical Image Classification
- Authors: Ye Du, Nanxi Yu, Shujun Wang,
- Abstract summary: We introduce the CILMP, Conditional Intervention of Large Language Models for Prompt Tuning.<n>It bridges Large Language Models (LLMs) and Vision-language foundation models (VLMs) to facilitate the transfer of medical knowledge into VLM prompts.<n>It consistently outperforms state-of-the-art prompt tuning methods, demonstrating its effectiveness.
- Score: 9.836162358361687
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision-language foundation models (VLMs) have shown great potential in feature transfer and generalization across a wide spectrum of medical-related downstream tasks. However, fine-tuning these models is resource-intensive due to their large number of parameters. Prompt tuning has emerged as a viable solution to mitigate memory usage and reduce training time while maintaining competitive performance. Nevertheless, the challenge is that existing prompt tuning methods cannot precisely distinguish different kinds of medical concepts, which miss essentially specific disease-related features across various medical imaging modalities in medical image classification tasks. We find that Large Language Models (LLMs), trained on extensive text corpora, are particularly adept at providing this specialized medical knowledge. Motivated by this, we propose incorporating LLMs into the prompt tuning process. Specifically, we introduce the CILMP, Conditional Intervention of Large Language Models for Prompt Tuning, a method that bridges LLMs and VLMs to facilitate the transfer of medical knowledge into VLM prompts. CILMP extracts disease-specific representations from LLMs, intervenes within a low-rank linear subspace, and utilizes them to create disease-specific prompts. Additionally, a conditional mechanism is incorporated to condition the intervention process on each individual medical image, generating instance-adaptive prompts and thus enhancing adaptability. Extensive experiments across diverse medical image datasets demonstrate that CILMP consistently outperforms state-of-the-art prompt tuning methods, demonstrating its effectiveness. Code is available at https://github.com/usr922/cilmp.
Related papers
- Multimodal Causal-Driven Representation Learning for Generalizable Medical Image Segmentation [56.52520416420957]
We propose Multimodal Causal-Driven Representation Learning (MCDRL) to tackle domain generalization in medical image segmentation.<n>MCDRL consistently outperforms competing methods, yielding superior segmentation accuracy and exhibiting robust generalizability.
arXiv Detail & Related papers (2025-08-07T03:41:41Z) - Multimodal Large Language Models for Medical Report Generation via Customized Prompt Tuning [20.195025131749944]
We present MRG-LLM, a novel large language model (MLLM) that combines a frozen LLM with a learnable visual encoder.<n>We propose two implementations: prompt-wise and promptbook-wise customization, enabling precise and targeted report generation.
arXiv Detail & Related papers (2025-06-18T14:09:34Z) - Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning [57.873833577058]
We build a multimodal dataset enriched with extensive medical knowledge.<n>We then introduce our medical-specialized MLLM: Lingshu.<n>Lingshu undergoes multi-stage training to embed medical expertise and enhance its task-solving capabilities.
arXiv Detail & Related papers (2025-06-08T08:47:30Z) - LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition? [59.81732629438753]
We propose LLaVA-RadZ, a simple yet effective framework for zero-shot medical disease recognition via utilizing the existing MLLM features.<n>Specifically, we design an end-to-end training strategy, termed Decoding-Side Feature Alignment Training (DFAT) to take advantage of the characteristics of the MLLM decoder architecture.<n>We also introduce a Domain Knowledge Anchoring Module (DKAM) to exploit the intrinsic medical knowledge of large models.
arXiv Detail & Related papers (2025-03-10T16:05:40Z) - Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions.<n>We propose a novel approach utilizing structured medical reasoning.<n>Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z) - Improving Medical Large Vision-Language Models with Abnormal-Aware Feedback [57.98393950821579]
We propose a novel UMed-LVLM designed to unveil medical abnormalities.<n>We propose a prompt method utilizing the GPT-4V to generate diagnoses based on identified abnormal areas in medical images.<n>Our UMed-LVLM significantly outperforms existing Med-LVLMs in identifying and understanding medical abnormalities.
arXiv Detail & Related papers (2025-01-02T17:37:20Z) - Parameter-Efficient Fine-Tuning Medical Multimodal Large Language Models for Medical Visual Grounding [9.144030136201476]
Multimodal large language models (MLLMs) inherit the superior text understanding capabilities of LLMs and extend these capabilities to multimodal scenarios.
These models achieve excellent results in the general domain of multimodal tasks.
However, in the medical domain, the substantial training costs and the requirement for extensive medical data pose challenges to the development of medical MLLMs.
arXiv Detail & Related papers (2024-10-31T11:07:26Z) - Mitigating Hallucinations of Large Language Models in Medical Information Extraction via Contrastive Decoding [92.32881381717594]
We introduce ALternate Contrastive Decoding (ALCD) to solve hallucination issues in medical information extraction tasks.
ALCD demonstrates significant improvements in resolving hallucination issues compared to conventional decoding methods.
arXiv Detail & Related papers (2024-10-21T07:19:19Z) - Curriculum Prompting Foundation Models for Medical Image Segmentation [17.33821260899367]
Adapting large pre-trained foundation models, e.g., SAM, for medical image segmentation remains a significant challenge.
Past works have been heavily reliant on a singular type of prompt for each instance, necessitating manual input of an ideally correct prompt.
We propose to utilize prompts of different granularity, which are sourced from original images to provide a broader scope of clinical insights.
In response, we have designed a coarse-to-fine mechanism, referred to as curriculum prompting, that progressively integrates prompts of different types.
arXiv Detail & Related papers (2024-09-01T11:00:18Z) - CoMT: Chain-of-Medical-Thought Reduces Hallucination in Medical Report Generation [20.59298361626719]
We propose a chain-of-medical-thought approach (CoMT) to mitigate hallucinations in medical report generation.<n>CoMT intends to imitate the cognitive process of human doctors by decomposing diagnostic procedures.
arXiv Detail & Related papers (2024-06-17T12:03:32Z) - Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models [17.643421997037514]
We propose a novel framework that tackles both discriminative and generative multimodal medical tasks.
The learning of Med-MoE consists of three steps: multimodal medical alignment, instruction tuning and routing, and domain-specific MoE tuning.
Our model can achieve performance superior to or on par with state-of-the-art baselines.
arXiv Detail & Related papers (2024-04-16T02:35:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.