Disease-informed Adaptation of Vision-Language Models
- URL: http://arxiv.org/abs/2405.15728v1
- Date: Fri, 24 May 2024 17:18:02 GMT
- Title: Disease-informed Adaptation of Vision-Language Models
- Authors: Jiajin Zhang, Ge Wang, Mannudeep K. Kalra, Pingkun Yan,
- Abstract summary: This paper investigates the potential of transfer learning with pre-trained vision-language models (VLMs) in medical image analysis.
We argue that effective adaptation of VLMs hinges on the nuanced representation learning of disease concepts.
We introduce disease-informed contextual prompting in a novel disease prototype learning framework.
- Score: 14.081146704890745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In medical image analysis, the expertise scarcity and the high cost of data annotation limits the development of large artificial intelligence models. This paper investigates the potential of transfer learning with pre-trained vision-language models (VLMs) in this domain. Currently, VLMs still struggle to transfer to the underrepresented diseases with minimal presence and new diseases entirely absent from the pretraining dataset. We argue that effective adaptation of VLMs hinges on the nuanced representation learning of disease concepts. By capitalizing on the joint visual-linguistic capabilities of VLMs, we introduce disease-informed contextual prompting in a novel disease prototype learning framework. This approach enables VLMs to grasp the concepts of new disease effectively and efficiently, even with limited data. Extensive experiments across multiple image modalities showcase notable enhancements in performance compared to existing techniques.
Related papers
- Advancing Brain Imaging Analysis Step-by-step via Progressive Self-paced Learning [0.5840945370755134]
We introduce the Progressive Self-Paced Distillation (PSPD) framework, employing an adaptive and progressive pacing and distillation mechanism.
We validate PSPD's efficacy and adaptability across various convolutional neural networks using the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset.
arXiv Detail & Related papers (2024-07-23T02:26:04Z) - Knowledge-grounded Adaptation Strategy for Vision-language Models: Building Unique Case-set for Screening Mammograms for Residents Training [5.819704618007536]
A visual-language model (VLM) pre-trained on natural images and text pairs poses a significant barrier when applied to medical contexts.
We propose a framework designed to adeptly tailor VLMs to the medical domain, employing selective sampling and hard-negative mining techniques.
arXiv Detail & Related papers (2024-05-30T04:04:36Z) - Mitigating Object Hallucination in Large Vision-Language Models via
Classifier-Free Guidance [56.04768229686853]
Large Vision-Language Models (LVLMs) tend to hallucinate non-existing objects in the images.
We introduce a framework called Mitigating hallucinAtion via classifieR-Free guIdaNcE (MARINE)
MARINE is both training-free and API-free, and can effectively and efficiently reduce object hallucinations during the generation process.
arXiv Detail & Related papers (2024-02-13T18:59:05Z) - Large Language Model Distilling Medication Recommendation Model [61.89754499292561]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs)
Our research aims to transform existing medication recommendation methodologies using LLMs.
To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z) - MLIP: Enhancing Medical Visual Representation with Divergence Encoder
and Knowledge-guided Contrastive Learning [48.97640824497327]
We propose a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning.
Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge.
Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
arXiv Detail & Related papers (2024-02-03T05:48:50Z) - Machine Vision Therapy: Multimodal Large Language Models Can Enhance Visual Robustness via Denoising In-Context Learning [67.0609518552321]
We propose to conduct Machine Vision Therapy which aims to rectify the noisy predictions from vision models.
By fine-tuning with the denoised labels, the learning model performance can be boosted in an unsupervised manner.
arXiv Detail & Related papers (2023-12-05T07:29:14Z) - SgVA-CLIP: Semantic-guided Visual Adapting of Vision-Language Models for
Few-shot Image Classification [84.05253637260743]
We propose a new framework, named Semantic-guided Visual Adapting (SgVA), to extend vision-language pre-trained models.
SgVA produces discriminative task-specific visual features by comprehensively using a vision-specific contrastive loss, a cross-modal contrastive loss, and an implicit knowledge distillation.
State-of-the-art results on 13 datasets demonstrate that the adapted visual features can well complement the cross-modal features to improve few-shot image classification.
arXiv Detail & Related papers (2022-11-28T14:58:15Z) - Transfer learning and Local interpretable model agnostic based visual
approach in Monkeypox Disease Detection and Classification: A Deep Learning
insights [0.0]
The recent development of Monkeypox disease poses a global pandemic threat when the world is still fighting Coronavirus Disease 2019 (COVID-19).
We have conducted two studies where we modified and tested six distinct deep learning models-VGG16, InceptionResNetV2, ResNet50, ResNet101, MobileNetV2, and VGG19-using transfer learning approaches.
Our preliminary computational results show that the proposed modified InceptionResNetV2 and MobileNetV2 models perform best by achieving an accuracy ranging from 93% to 99%.
arXiv Detail & Related papers (2022-11-01T18:07:34Z) - Medical Image Understanding with Pretrained Vision Language Models: A
Comprehensive Study [8.547751745702156]
We show that well-designed medical prompts are the key to elicit knowledge from pre-trained vision language models (VLM)
We develop three approaches for automatic generation of medical prompts, which can inject expert-level medical knowledge and image-specific information into the prompts for fine-grained grounding.
arXiv Detail & Related papers (2022-09-30T15:06:13Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.