TransMed: Large Language Models Enhance Vision Transformer for
Biomedical Image Classification
- URL: http://arxiv.org/abs/2312.07125v2
- Date: Sun, 4 Feb 2024 17:05:28 GMT
- Title: TransMed: Large Language Models Enhance Vision Transformer for
Biomedical Image Classification
- Authors: Kaipeng Zheng, Weiran Huang, Lichao Sun
- Abstract summary: Few-shot learning has been studied to adapt models to tasks with very few samples.
We propose a novel approach that contextualizes labels via large language models (LLMs)
Our findings reveal that the context generated by LLMs significantly enhances the discrimination of semantic embeddings for similar categories.
- Score: 11.202967500669402
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot learning has been studied to adapt models to tasks with very few
samples. It holds profound significance, particularly in clinical tasks, due to
the high annotation cost of medical images. Several works have explored
few-shot learning on medical images, yet they still require a large number of
medical images for pre-training models to gain domain-specific priors. Vision
foundation models recently have achieved remarkable success in natural images.
Hence, adapting rapidly advancing vision foundation models from natural images
to few-shot clinical tasks holds great promise. MedFMC has recently organized a
challenge to shed more light on this topic at NeurIPS 2023. In this work, we
present our challenge solution. We observe that a simple variant of fine-tuning
with partial freezing shows remarkable performance. Empirical evidence
demonstrates that this approach could outperform various common fine-tuning
methods under limited sample sizes. Additionally, we explore enhanced
utilization of semantic supervision to boost performance. We propose a novel
approach that contextualizes labels via large language models (LLMs). Our
findings reveal that the context generated by LLMs significantly enhances the
discrimination of semantic embeddings for similar categories, resulting in a
notable performance improvement of 3%-5% in 1-shot settings compared to
commonly employed one-hot labels and other semantic supervision methods. Our
solution secures the 1st place in the MedFMC challenge.
Related papers
- Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - One-shot Localization and Segmentation of Medical Images with Foundation
Models [7.9060536840474365]
We show that the models trained on natural images can offer good performance on medical images.
We leverage the correspondence with respect to a template image to prompt a Segment Anything (SAM) model to arrive at single shot segmentation.
We also show that our single-shot method outperforms the recently proposed few-shot segmentation method - UniverSeg.
arXiv Detail & Related papers (2023-10-28T08:58:20Z) - Plug-and-Play Feature Generation for Few-Shot Medical Image
Classification [23.969183389866686]
Few-shot learning presents immense potential in enhancing model generalization and practicality for medical image classification with limited training data.
We propose MedMFG, a flexible and lightweight plug-and-play method designed to generate sufficient class-distinctive features from limited samples.
arXiv Detail & Related papers (2023-10-14T02:36:14Z) - Self-Prompting Large Vision Models for Few-Shot Medical Image
Segmentation [14.135249795318591]
We propose a novel perspective on self-prompting in medical vision applications.
We harness the embedding space of the Segment Anything Model to prompt itself through a simple yet effective linear pixel-wise classifier.
We achieve competitive results on multiple datasets.
arXiv Detail & Related papers (2023-08-15T08:20:07Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - MedFMC: A Real-world Dataset and Benchmark For Foundation Model
Adaptation in Medical Image Classification [41.16626194300303]
Foundation models, often pre-trained with large-scale data, have achieved paramount success in jump-starting various vision and language applications.
Recent advances further enable adapting foundation models in downstream tasks efficiently using only a few training samples.
Yet, the application of such learning paradigms in medical image analysis remains scarce due to the shortage of publicly accessible data and benchmarks.
arXiv Detail & Related papers (2023-06-16T01:46:07Z) - Ambiguous Medical Image Segmentation using Diffusion Models [60.378180265885945]
We introduce a single diffusion model-based approach that produces multiple plausible outputs by learning a distribution over group insights.
Our proposed model generates a distribution of segmentation masks by leveraging the inherent sampling process of diffusion.
Comprehensive results show that our proposed approach outperforms existing state-of-the-art ambiguous segmentation networks.
arXiv Detail & Related papers (2023-04-10T17:58:22Z) - DiffMIC: Dual-Guidance Diffusion Network for Medical Image
Classification [32.67098520984195]
We propose the first diffusion-based model (named DiffMIC) to address general medical image classification.
Our experimental results demonstrate that DiffMIC outperforms state-of-the-art methods by a significant margin.
arXiv Detail & Related papers (2023-03-19T09:15:45Z) - Rethinking Semi-Supervised Medical Image Segmentation: A
Variance-Reduction Perspective [51.70661197256033]
We propose ARCO, a semi-supervised contrastive learning framework with stratified group theory for medical image segmentation.
We first propose building ARCO through the concept of variance-reduced estimation and show that certain variance-reduction techniques are particularly beneficial in pixel/voxel-level segmentation tasks.
We experimentally validate our approaches on eight benchmarks, i.e., five 2D/3D medical and three semantic segmentation datasets, with different label settings.
arXiv Detail & Related papers (2023-02-03T13:50:25Z) - MedSegDiff-V2: Diffusion based Medical Image Segmentation with
Transformer [53.575573940055335]
We propose a novel Transformer-based Diffusion framework, called MedSegDiff-V2.
We verify its effectiveness on 20 medical image segmentation tasks with different image modalities.
arXiv Detail & Related papers (2023-01-19T03:42:36Z) - Learning to Exploit Temporal Structure for Biomedical Vision-Language
Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities.
We explicitly account for prior images and reports when available during both training and fine-tuning.
Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.