TransMed: Large Language Models Enhance Vision Transformer for
Biomedical Image Classification
- URL: http://arxiv.org/abs/2312.07125v2
- Date: Sun, 4 Feb 2024 17:05:28 GMT
- Title: TransMed: Large Language Models Enhance Vision Transformer for
Biomedical Image Classification
- Authors: Kaipeng Zheng, Weiran Huang, Lichao Sun
- Abstract summary: Few-shot learning has been studied to adapt models to tasks with very few samples.
We propose a novel approach that contextualizes labels via large language models (LLMs)
Our findings reveal that the context generated by LLMs significantly enhances the discrimination of semantic embeddings for similar categories.
- Score: 11.202967500669402
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot learning has been studied to adapt models to tasks with very few
samples. It holds profound significance, particularly in clinical tasks, due to
the high annotation cost of medical images. Several works have explored
few-shot learning on medical images, yet they still require a large number of
medical images for pre-training models to gain domain-specific priors. Vision
foundation models recently have achieved remarkable success in natural images.
Hence, adapting rapidly advancing vision foundation models from natural images
to few-shot clinical tasks holds great promise. MedFMC has recently organized a
challenge to shed more light on this topic at NeurIPS 2023. In this work, we
present our challenge solution. We observe that a simple variant of fine-tuning
with partial freezing shows remarkable performance. Empirical evidence
demonstrates that this approach could outperform various common fine-tuning
methods under limited sample sizes. Additionally, we explore enhanced
utilization of semantic supervision to boost performance. We propose a novel
approach that contextualizes labels via large language models (LLMs). Our
findings reveal that the context generated by LLMs significantly enhances the
discrimination of semantic embeddings for similar categories, resulting in a
notable performance improvement of 3%-5% in 1-shot settings compared to
commonly employed one-hot labels and other semantic supervision methods. Our
solution secures the 1st place in the MedFMC challenge.
Related papers
- LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model [55.80651780294357]
State-of-the-art medical multi-modal large language models (med-MLLM) leverage instruction-following data in pre-training.
LoGra-Med is a new multi-graph alignment algorithm that enforces triplet correlations across image modalities, conversation-based descriptions, and extended captions.
Our results show LoGra-Med matches LLAVA-Med performance on 600K image-text pairs for Medical VQA and significantly outperforms it when trained on 10% of the data.
arXiv Detail & Related papers (2024-10-03T15:52:03Z) - Few-shot Adaptation of Medical Vision-Language Models [17.11090825001394]
We introduce the first structured benchmark for adapting medical vision-language models (VLMs) in a strict few-shot regime.
We evaluate a simple generalization of the linear-probe adaptation baseline, which seeks an optimal blending of the visual prototypes and text embeddings.
Surprisingly, such a text-informed linear probe yields competitive performances in comparison to convoluted prompt-learning and adapter-based strategies.
arXiv Detail & Related papers (2024-09-05T19:10:29Z) - DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception [66.88792390480343]
We propose DEEM, a simple but effective approach that utilizes the generative feedback of diffusion models to align the semantic distributions of the image encoder.
DEEM exhibits enhanced robustness and a superior capacity to alleviate model hallucinations while utilizing fewer trainable parameters, less pre-training data, and a smaller base model size.
arXiv Detail & Related papers (2024-05-24T05:46:04Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - One-shot Localization and Segmentation of Medical Images with Foundation
Models [7.9060536840474365]
We show that the models trained on natural images can offer good performance on medical images.
We leverage the correspondence with respect to a template image to prompt a Segment Anything (SAM) model to arrive at single shot segmentation.
We also show that our single-shot method outperforms the recently proposed few-shot segmentation method - UniverSeg.
arXiv Detail & Related papers (2023-10-28T08:58:20Z) - Plug-and-Play Feature Generation for Few-Shot Medical Image
Classification [23.969183389866686]
Few-shot learning presents immense potential in enhancing model generalization and practicality for medical image classification with limited training data.
We propose MedMFG, a flexible and lightweight plug-and-play method designed to generate sufficient class-distinctive features from limited samples.
arXiv Detail & Related papers (2023-10-14T02:36:14Z) - Self-Prompting Large Vision Models for Few-Shot Medical Image
Segmentation [14.135249795318591]
We propose a novel perspective on self-prompting in medical vision applications.
We harness the embedding space of the Segment Anything Model to prompt itself through a simple yet effective linear pixel-wise classifier.
We achieve competitive results on multiple datasets.
arXiv Detail & Related papers (2023-08-15T08:20:07Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - DiffMIC: Dual-Guidance Diffusion Network for Medical Image
Classification [32.67098520984195]
We propose the first diffusion-based model (named DiffMIC) to address general medical image classification.
Our experimental results demonstrate that DiffMIC outperforms state-of-the-art methods by a significant margin.
arXiv Detail & Related papers (2023-03-19T09:15:45Z) - Rethinking Semi-Supervised Medical Image Segmentation: A
Variance-Reduction Perspective [51.70661197256033]
We propose ARCO, a semi-supervised contrastive learning framework with stratified group theory for medical image segmentation.
We first propose building ARCO through the concept of variance-reduced estimation and show that certain variance-reduction techniques are particularly beneficial in pixel/voxel-level segmentation tasks.
We experimentally validate our approaches on eight benchmarks, i.e., five 2D/3D medical and three semantic segmentation datasets, with different label settings.
arXiv Detail & Related papers (2023-02-03T13:50:25Z) - MedSegDiff-V2: Diffusion based Medical Image Segmentation with
Transformer [53.575573940055335]
We propose a novel Transformer-based Diffusion framework, called MedSegDiff-V2.
We verify its effectiveness on 20 medical image segmentation tasks with different image modalities.
arXiv Detail & Related papers (2023-01-19T03:42:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.