Efficient Few-Shot Medical Image Analysis via Hierarchical Contrastive Vision-Language Learning
- URL: http://arxiv.org/abs/2501.09294v1
- Date: Thu, 16 Jan 2025 05:01:30 GMT
- Title: Efficient Few-Shot Medical Image Analysis via Hierarchical Contrastive Vision-Language Learning
- Authors: Harrison Fuller, Fernando Gabriela Garcia, Victor Flores,
- Abstract summary: We propose Adaptive Vision-Language Fine-tuning with Hierarchical Contrastive Alignment (HiCA) for medical image analysis.
HiCA combines domain-specific pretraining and hierarchical contrastive learning to align visual and textual representations at multiple levels.
We evaluate our approach on two benchmark datasets, Chest X-ray and Breast Ultrasound.
- Score: 44.99833362998488
- License:
- Abstract: Few-shot learning in medical image classification presents a significant challenge due to the limited availability of annotated data and the complex nature of medical imagery. In this work, we propose Adaptive Vision-Language Fine-tuning with Hierarchical Contrastive Alignment (HiCA), a novel framework that leverages the capabilities of Large Vision-Language Models (LVLMs) for medical image analysis. HiCA introduces a two-stage fine-tuning strategy, combining domain-specific pretraining and hierarchical contrastive learning to align visual and textual representations at multiple levels. We evaluate our approach on two benchmark datasets, Chest X-ray and Breast Ultrasound, achieving state-of-the-art performance in both few-shot and zero-shot settings. Further analyses demonstrate the robustness, generalizability, and interpretability of our method, with substantial improvements in performance compared to existing baselines. Our work highlights the potential of hierarchical contrastive strategies in adapting LVLMs to the unique challenges of medical imaging tasks.
Related papers
- Knowledge-grounded Adaptation Strategy for Vision-language Models: Building Unique Case-set for Screening Mammograms for Residents Training [5.819704618007536]
A visual-language model (VLM) pre-trained on natural images and text pairs poses a significant barrier when applied to medical contexts.
We propose a framework designed to adeptly tailor VLMs to the medical domain, employing selective sampling and hard-negative mining techniques.
arXiv Detail & Related papers (2024-05-30T04:04:36Z) - Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning [64.1316997189396]
We present a novel language-tied self-supervised learning framework, Hierarchical Language-tied Self-Supervision (HLSS) for histopathology images.
Our resulting model achieves state-of-the-art performance on two medical imaging benchmarks, OpenSRH and TCGA datasets.
arXiv Detail & Related papers (2024-03-21T17:58:56Z) - Robust and Interpretable Medical Image Classifiers via Concept
Bottleneck Models [49.95603725998561]
We propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts.
Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-language model.
arXiv Detail & Related papers (2023-10-04T21:57:09Z) - Masked Vision and Language Pre-training with Unimodal and Multimodal
Contrastive Losses for Medical Visual Question Answering [7.669872220702526]
We present a novel self-supervised approach that learns unimodal and multimodal feature representations of input images and text.
The proposed approach achieves state-of-the-art (SOTA) performance on three publicly available medical VQA datasets.
arXiv Detail & Related papers (2023-07-11T15:00:11Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Learning to Exploit Temporal Structure for Biomedical Vision-Language
Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities.
We explicitly account for prior images and reports when available during both training and fine-tuning.
Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z) - LMFLOSS: A Hybrid Loss For Imbalanced Medical Image Classification [2.4866930218890837]
We propose a novel framework called Large Margin aware (LMF) loss to mitigate the class imbalance problem in medical imaging datasets.
This framework harnesses the distinct characteristics of both loss functions by enforcing wider margins for minority classes.
We provide empirical evidence that our proposed framework consistently outperforms other baseline methods.
arXiv Detail & Related papers (2022-12-24T14:19:44Z) - UnICLAM:Contrastive Representation Learning with Adversarial Masking for
Unified and Interpretable Medical Vision Question Answering [7.2486693553383805]
Current Medical-VQA models learn cross-modal representations through residing vision and texture encoders in dual separate spaces.
We propose UnICLAM, a Unified and Interpretable Medical-VQA model through Contrastive Representation Learning with Adversarial Masking.
Experimental results on VQA-RAD and SLAKE public benchmarks demonstrate that UnICLAM outperforms existing 11 state-of-the-art Medical-VQA models.
arXiv Detail & Related papers (2022-12-21T02:48:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.