Exploring Low-Resource Medical Image Classification with Weakly
Supervised Prompt Learning
- URL: http://arxiv.org/abs/2402.03783v1
- Date: Tue, 6 Feb 2024 07:53:23 GMT
- Title: Exploring Low-Resource Medical Image Classification with Weakly
Supervised Prompt Learning
- Authors: Fudan Zheng, Jindong Cao, Weijiang Yu, Zhiguang Chen, Nong Xiao,
Yutong Lu
- Abstract summary: Existing pre-trained vision-language models require domain experts to carefully design the medical prompts.
We propose a weakly supervised prompt learning method MedPrompt to automatically generate medical prompts.
We show that the model using our automatically generated prompts outperforms its full-shot learning hand-crafted prompts.
- Score: 21.604146757986765
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most advances in medical image recognition supporting clinical auxiliary
diagnosis meet challenges due to the low-resource situation in the medical
field, where annotations are highly expensive and professional. This
low-resource problem can be alleviated by leveraging the transferable
representations of large-scale pre-trained vision-language models via relevant
medical text prompts. However, existing pre-trained vision-language models
require domain experts to carefully design the medical prompts, which greatly
increases the burden on clinicians. To address this problem, we propose a
weakly supervised prompt learning method MedPrompt to automatically generate
medical prompts, which includes an unsupervised pre-trained vision-language
model and a weakly supervised prompt learning model. The unsupervised
pre-trained vision-language model utilizes the natural correlation between
medical images and corresponding medical texts for pre-training, without any
manual annotations. The weakly supervised prompt learning model only utilizes
the classes of images in the dataset to guide the learning of the specific
class vector in the prompt, while the learning of other context vectors in the
prompt requires no manual annotations for guidance. To the best of our
knowledge, this is the first model to automatically generate medical prompts.
With these prompts, the pre-trained vision-language model can be freed from the
strong expert dependency of manual annotation and manual prompt design.
Experimental results show that the model using our automatically generated
prompts outperforms its full-shot learning hand-crafted prompts counterparts
with only a minimal number of labeled samples for few-shot learning, and
reaches superior or comparable accuracy on zero-shot image classification. The
proposed prompt generator is lightweight and therefore can be embedded into any
network architecture.
Related papers
- DualPrompt-MedCap: A Dual-Prompt Enhanced Approach for Medical Image Captioning [5.456249017636404]
We present DualPrompt-MedCap, a novel dual-prompt enhancement framework that augments Large Vision-Language Models (LVLMs)
A modality-aware prompt derived from a semi-supervised classification model pretrained on medical question-answer pairs, and a question-guided prompt leveraging biomedical language model embeddings.
Our method enables the generation of clinically accurate reports that can serve as medical experts' prior knowledge and automatic annotations for downstream vision-language tasks.
arXiv Detail & Related papers (2025-04-13T14:31:55Z) - Curriculum Prompting Foundation Models for Medical Image Segmentation [17.33821260899367]
Adapting large pre-trained foundation models, e.g., SAM, for medical image segmentation remains a significant challenge.
Past works have been heavily reliant on a singular type of prompt for each instance, necessitating manual input of an ideally correct prompt.
We propose to utilize prompts of different granularity, which are sourced from original images to provide a broader scope of clinical insights.
In response, we have designed a coarse-to-fine mechanism, referred to as curriculum prompting, that progressively integrates prompts of different types.
arXiv Detail & Related papers (2024-09-01T11:00:18Z) - Enhancing Large Vision Language Models with Self-Training on Image Comprehension [131.14381425260706]
We introduce Self-Training on Image (STIC), which emphasizes a self-training approach specifically for image comprehension.
First, the model self-constructs a preference for image descriptions using unlabeled images.
To further self-improve reasoning on the extracted visual information, we let the model reuse a small portion of existing instruction-tuning data.
arXiv Detail & Related papers (2024-05-30T05:53:49Z) - Pseudo-Prompt Generating in Pre-trained Vision-Language Models for Multi-Label Medical Image Classification [3.1029532920699934]
We introduce a novel prompt generation approach in by text generation in natural language processing (NLP)
Our method, named Pseudo-Prompt Generating (PsPG), capitalizes on the priori knowledge of multi-modal features.
Features a RNN-based decoder, PsPG autoregressively generates class-tailored embedding vectors, i.e., pseudo-prompts.
arXiv Detail & Related papers (2024-05-10T13:27:32Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Clinical information extraction for Low-resource languages with Few-shot learning using Pre-trained language models and Prompting [12.166472806042592]
Automatic extraction of medical information from clinical documents poses several challenges.
Recent advances in domain-adaptation and prompting methods showed promising results with minimal training data.
We demonstrate that a lightweight, domain-adapted pretrained model, prompted with just 20 shots, outperforms a traditional classification model by 30.5% accuracy.
arXiv Detail & Related papers (2024-03-20T08:01:33Z) - XCoOp: Explainable Prompt Learning for Computer-Aided Diagnosis via Concept-guided Context Optimization [4.634780391920529]
We propose a novel explainable prompt learning framework that leverages medical knowledge by aligning the semantics of images, learnable prompts, and clinical concept-driven prompts.
Our framework addresses the lack of valuable concept annotations by eliciting knowledge from large language models.
Our method simultaneously achieves superior diagnostic performance, flexibility, and interpretability, shedding light on the effectiveness of foundation models in facilitating XAI.
arXiv Detail & Related papers (2024-03-14T14:02:01Z) - MLIP: Enhancing Medical Visual Representation with Divergence Encoder
and Knowledge-guided Contrastive Learning [48.97640824497327]
We propose a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning.
Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge.
Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
arXiv Detail & Related papers (2024-02-03T05:48:50Z) - Customizing General-Purpose Foundation Models for Medical Report
Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks.
We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z) - LLaVA-Med: Training a Large Language-and-Vision Assistant for
Biomedicine in One Day [85.19963303642427]
We propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images.
The model first learns to align biomedical vocabulary using the figure-caption pairs as is, then learns to master open-ended conversational semantics.
This enables us to train a Large Language and Vision Assistant for BioMedicine in less than 15 hours (with eight A100s)
arXiv Detail & Related papers (2023-06-01T16:50:07Z) - Medical Image Understanding with Pretrained Vision Language Models: A
Comprehensive Study [8.547751745702156]
We show that well-designed medical prompts are the key to elicit knowledge from pre-trained vision language models (VLM)
We develop three approaches for automatic generation of medical prompts, which can inject expert-level medical knowledge and image-specific information into the prompts for fine-grained grounding.
arXiv Detail & Related papers (2022-09-30T15:06:13Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.