Related papers: TransMed: Large Language Models Enhance Vision Transformer for Biomedical Image Classification

TransMed: Large Language Models Enhance Vision Transformer for Biomedical Image Classification

URL: http://arxiv.org/abs/2312.07125v2
Date: Sun, 4 Feb 2024 17:05:28 GMT
Title: TransMed: Large Language Models Enhance Vision Transformer for Biomedical Image Classification
Authors: Kaipeng Zheng, Weiran Huang, Lichao Sun
Abstract summary: Few-shot learning has been studied to adapt models to tasks with very few samples. We propose a novel approach that contextualizes labels via large language models (LLMs) Our findings reveal that the context generated by LLMs significantly enhances the discrimination of semantic embeddings for similar categories.
Score: 11.202967500669402
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Few-shot learning has been studied to adapt models to tasks with very few samples. It holds profound significance, particularly in clinical tasks, due to the high annotation cost of medical images. Several works have explored few-shot learning on medical images, yet they still require a large number of medical images for pre-training models to gain domain-specific priors. Vision foundation models recently have achieved remarkable success in natural images. Hence, adapting rapidly advancing vision foundation models from natural images to few-shot clinical tasks holds great promise. MedFMC has recently organized a challenge to shed more light on this topic at NeurIPS 2023. In this work, we present our challenge solution. We observe that a simple variant of fine-tuning with partial freezing shows remarkable performance. Empirical evidence demonstrates that this approach could outperform various common fine-tuning methods under limited sample sizes. Additionally, we explore enhanced utilization of semantic supervision to boost performance. We propose a novel approach that contextualizes labels via large language models (LLMs). Our findings reveal that the context generated by LLMs significantly enhances the discrimination of semantic embeddings for similar categories, resulting in a notable performance improvement of 3%-5% in 1-shot settings compared to commonly employed one-hot labels and other semantic supervision methods. Our solution secures the 1st place in the MedFMC challenge.

Related papers

LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model [55.80651780294357]
State-of-the-art medical multi-modal large language models (med-MLLM) leverage instruction-following data in pre-training. LoGra-Med is a new multi-graph alignment algorithm that enforces triplet correlations across image modalities, conversation-based descriptions, and extended captions. Our results show LoGra-Med matches LLAVA-Med performance on 600K image-text pairs for Medical VQA and significantly outperforms it when trained on 10% of the data.
arXiv Detail & Related papers (2024-10-03T15:52:03Z)
Few-shot Adaptation of Medical Vision-Language Models [17.11090825001394]
We introduce the first structured benchmark for adapting medical vision-language models (VLMs) in a strict few-shot regime. We evaluate a simple generalization of the linear-probe adaptation baseline, which seeks an optimal blending of the visual prototypes and text embeddings. Surprisingly, such a text-informed linear probe yields competitive performances in comparison to convoluted prompt-learning and adapter-based strategies.
arXiv Detail & Related papers (2024-09-05T19:10:29Z)
DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception [66.88792390480343]
We propose DEEM, a simple but effective approach that utilizes the generative feedback of diffusion models to align the semantic distributions of the image encoder. DEEM exhibits enhanced robustness and a superior capacity to alleviate model hallucinations while utilizing fewer trainable parameters, less pre-training data, and a smaller base model size.
arXiv Detail & Related papers (2024-05-24T05:46:04Z)
Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection. Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels. Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z)
One-shot Localization and Segmentation of Medical Images with Foundation Models [7.9060536840474365]
We show that the models trained on natural images can offer good performance on medical images. We leverage the correspondence with respect to a template image to prompt a Segment Anything (SAM) model to arrive at single shot segmentation. We also show that our single-shot method outperforms the recently proposed few-shot segmentation method - UniverSeg.
arXiv Detail & Related papers (2023-10-28T08:58:20Z)
Plug-and-Play Feature Generation for Few-Shot Medical Image Classification [23.969183389866686]
Few-shot learning presents immense potential in enhancing model generalization and practicality for medical image classification with limited training data. We propose MedMFG, a flexible and lightweight plug-and-play method designed to generate sufficient class-distinctive features from limited samples.
arXiv Detail & Related papers (2023-10-14T02:36:14Z)
Self-Prompting Large Vision Models for Few-Shot Medical Image Segmentation [14.135249795318591]
We propose a novel perspective on self-prompting in medical vision applications. We harness the embedding space of the Segment Anything Model to prompt itself through a simple yet effective linear pixel-wise classifier. We achieve competitive results on multiple datasets.
arXiv Detail & Related papers (2023-08-15T08:20:07Z)
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z)
Ambiguous Medical Image Segmentation using Diffusion Models [60.378180265885945]
We introduce a single diffusion model-based approach that produces multiple plausible outputs by learning a distribution over group insights. Our proposed model generates a distribution of segmentation masks by leveraging the inherent sampling process of diffusion. Comprehensive results show that our proposed approach outperforms existing state-of-the-art ambiguous segmentation networks.
arXiv Detail & Related papers (2023-04-10T17:58:22Z)
DiffMIC: Dual-Guidance Diffusion Network for Medical Image Classification [32.67098520984195]
We propose the first diffusion-based model (named DiffMIC) to address general medical image classification. Our experimental results demonstrate that DiffMIC outperforms state-of-the-art methods by a significant margin.
arXiv Detail & Related papers (2023-03-19T09:15:45Z)
Rethinking Semi-Supervised Medical Image Segmentation: A Variance-Reduction Perspective [51.70661197256033]
We propose ARCO, a semi-supervised contrastive learning framework with stratified group theory for medical image segmentation. We first propose building ARCO through the concept of variance-reduced estimation and show that certain variance-reduction techniques are particularly beneficial in pixel/voxel-level segmentation tasks. We experimentally validate our approaches on eight benchmarks, i.e., five 2D/3D medical and three semantic segmentation datasets, with different label settings.
arXiv Detail & Related papers (2023-02-03T13:50:25Z)
MedSegDiff-V2: Diffusion based Medical Image Segmentation with Transformer [53.575573940055335]
We propose a novel Transformer-based Diffusion framework, called MedSegDiff-V2. We verify its effectiveness on 20 medical image segmentation tasks with different image modalities.
arXiv Detail & Related papers (2023-01-19T03:42:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.