MeLo: Low-rank Adaptation is Better than Fine-tuning for Medical Image Diagnosis
- URL: http://arxiv.org/abs/2311.08236v2
- Date: Mon, 22 Jul 2024 05:39:53 GMT
- Title: MeLo: Low-rank Adaptation is Better than Fine-tuning for Medical Image Diagnosis
- Authors: Yitao Zhu, Zhenrong Shen, Zihao Zhao, Sheng Wang, Xin Wang, Xiangyu Zhao, Dinggang Shen, Qian Wang,
- Abstract summary: Vision Transformers (ViT) have become much larger and less accessible to medical imaging communities.
MeLo (Medical image Low-rank adaptation) adopts low-rank adaptation instead of resource-demanding fine-tuning.
Our proposed method achieves comparable performance to fully fine-tuned ViT models on four distinct medical imaging datasets.
- Score: 63.59184480010552
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The common practice in developing computer-aided diagnosis (CAD) models based on transformer architectures usually involves fine-tuning from ImageNet pre-trained weights. However, with recent advances in large-scale pre-training and the practice of scaling laws, Vision Transformers (ViT) have become much larger and less accessible to medical imaging communities. Additionally, in real-world scenarios, the deployments of multiple CAD models can be troublesome due to problems such as limited storage space and time-consuming model switching. To address these challenges, we propose a new method MeLo (Medical image Low-rank adaptation), which enables the development of a single CAD model for multiple clinical tasks in a lightweight manner. It adopts low-rank adaptation instead of resource-demanding fine-tuning. By fixing the weight of ViT models and only adding small low-rank plug-ins, we achieve competitive results on various diagnosis tasks across different imaging modalities using only a few trainable parameters. Specifically, our proposed method achieves comparable performance to fully fine-tuned ViT models on four distinct medical imaging datasets using about 0.17% trainable parameters. Moreover, MeLo adds only about 0.5MB of storage space and allows for extremely fast model switching in deployment and inference. Our source code and pre-trained weights are available on our website (https://absterzhu.github.io/melo.github.io/).
Related papers
- LV-UNet: A Lightweight and Vanilla Model for Medical Image Segmentation [16.604140484767377]
This paper proposes a lightweight and vanilla model called LV-ColonUNet, which effectively utilizes pre-trained MobileNetv3-Large models and introduces inference modules.
Experiments are conducted on ISIC 2016, BUSI, CVC- ClinicDB, CVC-SEG datasets, achieving better performance compared to the state-of-theart and classic models.
arXiv Detail & Related papers (2024-08-29T20:19:10Z) - LiteNeXt: A Novel Lightweight ConvMixer-based Model with Self-embedding Representation Parallel for Medical Image Segmentation [2.0901574458380403]
We propose a new lightweight but efficient model, namely LiteNeXt, for medical image segmentation.
LiteNeXt is trained from scratch with small amount of parameters (0.71M) and Giga Floating Point Operations Per Second (0.42).
arXiv Detail & Related papers (2024-04-04T01:59:19Z) - Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration [100.54419875604721]
All-in-one image restoration tackles different types of degradations with a unified model instead of having task-specific, non-generic models for each degradation.
We propose DyNet, a dynamic family of networks designed in an encoder-decoder style for all-in-one image restoration tasks.
Our DyNet can seamlessly switch between its bulkier and lightweight variants, thereby offering flexibility for efficient model deployment.
arXiv Detail & Related papers (2024-04-02T17:58:49Z) - DVPT: Dynamic Visual Prompt Tuning of Large Pre-trained Models for
Medical Image Analysis [30.608225734194416]
We propose a dynamic visual prompt tuning method, named DVPT, for medical image analysis.
It can extract knowledge beneficial to downstream tasks from large models with a few trainable parameters.
It can save up to 60% labeled data and 99% storage cost of ViT-B/16.
arXiv Detail & Related papers (2023-07-19T07:11:11Z) - DINOv2: Learning Robust Visual Features without Supervision [75.42921276202522]
This work shows that existing pretraining methods, especially self-supervised methods, can produce such features if trained on enough curated data from diverse sources.
Most of the technical contributions aim at accelerating and stabilizing the training at scale.
In terms of data, we propose an automatic pipeline to build a dedicated, diverse, and curated image dataset instead of uncurated data, as typically done in the self-supervised literature.
arXiv Detail & Related papers (2023-04-14T15:12:19Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - AIM: Adapting Image Models for Efficient Video Action Recognition [22.805026175928997]
We propose a method to Adapt pre-trained Image Models (AIM) for efficient video understanding.
By freezing the pre-trained video model and adding a few lightweight Adapters, we introduce spatial adaptation, temporal adaptation and joint adaptation.
We show that our proposed AIM can achieve competitive or even better performance than prior arts with substantially fewer tunable parameters.
arXiv Detail & Related papers (2023-02-06T18:59:17Z) - PatchDropout: Economizing Vision Transformers Using Patch Dropout [9.243684409949436]
We show that standard ViT models can be efficiently trained at high resolution by randomly dropping input image patches.
We observe a 5 times savings in computation and memory using PatchDropout, along with a boost in performance.
arXiv Detail & Related papers (2022-08-10T14:08:55Z) - MiniViT: Compressing Vision Transformers with Weight Multiplexing [88.54212027516755]
Vision Transformer (ViT) models have recently drawn much attention in computer vision due to their high model capability.
MiniViT is a new compression framework, which achieves parameter reduction in vision transformers while retaining the same performance.
arXiv Detail & Related papers (2022-04-14T17:59:05Z) - Plug-In Inversion: Model-Agnostic Inversion for Vision with Data
Augmentations [61.95114821573875]
We introduce Plug-In Inversion, which relies on a simple set of augmentations and does not require excessive hyper- parameter tuning.
We illustrate the practicality of our approach by inverting Vision Transformers (ViTs) and Multi-Layer Perceptrons (MLPs) trained on the ImageNet dataset.
arXiv Detail & Related papers (2022-01-31T02:12:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.