Related papers: MeLo: Low-rank Adaptation is Better than Fine-tuning for Medical Image Diagnosis

MeLo: Low-rank Adaptation is Better than Fine-tuning for Medical Image Diagnosis

URL: http://arxiv.org/abs/2311.08236v2
Date: Mon, 22 Jul 2024 05:39:53 GMT
Title: MeLo: Low-rank Adaptation is Better than Fine-tuning for Medical Image Diagnosis
Authors: Yitao Zhu, Zhenrong Shen, Zihao Zhao, Sheng Wang, Xin Wang, Xiangyu Zhao, Dinggang Shen, Qian Wang,
Abstract summary: Vision Transformers (ViT) have become much larger and less accessible to medical imaging communities. MeLo (Medical image Low-rank adaptation) adopts low-rank adaptation instead of resource-demanding fine-tuning. Our proposed method achieves comparable performance to fully fine-tuned ViT models on four distinct medical imaging datasets.
Score: 63.59184480010552
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The common practice in developing computer-aided diagnosis (CAD) models based on transformer architectures usually involves fine-tuning from ImageNet pre-trained weights. However, with recent advances in large-scale pre-training and the practice of scaling laws, Vision Transformers (ViT) have become much larger and less accessible to medical imaging communities. Additionally, in real-world scenarios, the deployments of multiple CAD models can be troublesome due to problems such as limited storage space and time-consuming model switching. To address these challenges, we propose a new method MeLo (Medical image Low-rank adaptation), which enables the development of a single CAD model for multiple clinical tasks in a lightweight manner. It adopts low-rank adaptation instead of resource-demanding fine-tuning. By fixing the weight of ViT models and only adding small low-rank plug-ins, we achieve competitive results on various diagnosis tasks across different imaging modalities using only a few trainable parameters. Specifically, our proposed method achieves comparable performance to fully fine-tuned ViT models on four distinct medical imaging datasets using about 0.17% trainable parameters. Moreover, MeLo adds only about 0.5MB of storage space and allows for extremely fast model switching in deployment and inference. Our source code and pre-trained weights are available on our website (https://absterzhu.github.io/melo.github.io/).

Related papers

Mobile U-ViT: Revisiting large kernel and U-shaped ViT for efficient medical image segmentation [22.045663130551446]
We propose a mobile model called Mobile U-shaped Vision Transformer (Mobile U-ViT) tailored for medical image segmentation.<n>This design exhibits transformer-like representation learning capacity while being lighter and faster.<n>Despite its reduced computational demands, our architecture achieves state-of-the-art performance across eight public 2D and 3D datasets.
arXiv Detail & Related papers (2025-08-01T20:45:42Z)
UGoDIT: Unsupervised Group Deep Image Prior Via Transferable Weights [10.447347462729462]
UGoDIT is designed for the low-data regime where only a very small number, M, of sub-sampled measurement vectors are available during training.<n>Our method learns a set of transferable weights by optimizing a shared encoder and M disentangled decoders.<n>We evaluate UGoDIT on both medical (multi-coil MRI) and natural (super resolution and non-linear deblurring) image recovery tasks.
arXiv Detail & Related papers (2025-05-16T22:05:28Z)
Multi-Scale Invertible Neural Network for Wide-Range Variable-Rate Learned Image Compression [90.59962443790593]
In this paper, we present a variable-rate image compression model based on invertible transform to overcome limitations. Specifically, we design a lightweight multi-scale invertible neural network, which maps the input image into multi-scale latent representations. Experimental results demonstrate that the proposed method achieves state-of-the-art performance compared to existing variable-rate methods.
arXiv Detail & Related papers (2025-03-27T09:08:39Z)
LiteNeXt: A Novel Lightweight ConvMixer-based Model with Self-embedding Representation Parallel for Medical Image Segmentation [2.0901574458380403]
We propose a new lightweight but efficient model, namely LiteNeXt, for medical image segmentation. LiteNeXt is trained from scratch with small amount of parameters (0.71M) and Giga Floating Point Operations Per Second (0.42).
arXiv Detail & Related papers (2024-04-04T01:59:19Z)
Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration [100.54419875604721]
All-in-one image restoration tackles different types of degradations with a unified model instead of having task-specific, non-generic models for each degradation. We propose DyNet, a dynamic family of networks designed in an encoder-decoder style for all-in-one image restoration tasks. Our DyNet can seamlessly switch between its bulkier and lightweight variants, thereby offering flexibility for efficient model deployment.
arXiv Detail & Related papers (2024-04-02T17:58:49Z)
DVPT: Dynamic Visual Prompt Tuning of Large Pre-trained Models for Medical Image Analysis [30.608225734194416]
We propose a dynamic visual prompt tuning method, named DVPT, for medical image analysis. It can extract knowledge beneficial to downstream tasks from large models with a few trainable parameters. It can save up to 60% labeled data and 99% storage cost of ViT-B/16.
arXiv Detail & Related papers (2023-07-19T07:11:11Z)
DINOv2: Learning Robust Visual Features without Supervision [75.42921276202522]
This work shows that existing pretraining methods, especially self-supervised methods, can produce such features if trained on enough curated data from diverse sources. Most of the technical contributions aim at accelerating and stabilizing the training at scale. In terms of data, we propose an automatic pipeline to build a dedicated, diverse, and curated image dataset instead of uncurated data, as typically done in the self-supervised literature.
arXiv Detail & Related papers (2023-04-14T15:12:19Z)
eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception. Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency. We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z)
AIM: Adapting Image Models for Efficient Video Action Recognition [22.805026175928997]
We propose a method to Adapt pre-trained Image Models (AIM) for efficient video understanding. By freezing the pre-trained video model and adding a few lightweight Adapters, we introduce spatial adaptation, temporal adaptation and joint adaptation. We show that our proposed AIM can achieve competitive or even better performance than prior arts with substantially fewer tunable parameters.
arXiv Detail & Related papers (2023-02-06T18:59:17Z)
PatchDropout: Economizing Vision Transformers Using Patch Dropout [9.243684409949436]
We show that standard ViT models can be efficiently trained at high resolution by randomly dropping input image patches. We observe a 5 times savings in computation and memory using PatchDropout, along with a boost in performance.
arXiv Detail & Related papers (2022-08-10T14:08:55Z)
Generative Transfer Learning: Covid-19 Classification with a few Chest X-ray Images [0.0]
Deep learning models can expedite interpretation and alleviate the work of human experts. Deep Transfer Learning addresses this problem by using a pretrained model in the public domain. We present 1 a simpler generative source model, pretrained on a single but related concept, can perform as effectively as existing larger pretrained models.
arXiv Detail & Related papers (2022-08-10T12:37:52Z)
MiniViT: Compressing Vision Transformers with Weight Multiplexing [88.54212027516755]
Vision Transformer (ViT) models have recently drawn much attention in computer vision due to their high model capability. MiniViT is a new compression framework, which achieves parameter reduction in vision transformers while retaining the same performance.
arXiv Detail & Related papers (2022-04-14T17:59:05Z)
Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations [61.95114821573875]
We introduce Plug-In Inversion, which relies on a simple set of augmentations and does not require excessive hyper- parameter tuning. We illustrate the practicality of our approach by inverting Vision Transformers (ViTs) and Multi-Layer Perceptrons (MLPs) trained on the ImageNet dataset.
arXiv Detail & Related papers (2022-01-31T02:12:45Z)
A Meta-Learning Approach for Medical Image Registration [6.518615946009265]
We propose a novel unsupervised registration model which is integrated with a gradient-based meta learning framework. In our experiments, the proposed model obtained significantly improved performance in terms of accuracy and training time.
arXiv Detail & Related papers (2021-04-21T10:27:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.