DVPT: Dynamic Visual Prompt Tuning of Large Pre-trained Models for
Medical Image Analysis
- URL: http://arxiv.org/abs/2307.09787v1
- Date: Wed, 19 Jul 2023 07:11:11 GMT
- Title: DVPT: Dynamic Visual Prompt Tuning of Large Pre-trained Models for
Medical Image Analysis
- Authors: Along He, Kai Wang, Zhihong Wang, Tao Li, and Huazhu Fu
- Abstract summary: We propose a dynamic visual prompt tuning method, named DVPT, for medical image analysis.
It can extract knowledge beneficial to downstream tasks from large models with a few trainable parameters.
It can save up to 60% labeled data and 99% storage cost of ViT-B/16.
- Score: 30.608225734194416
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Limited labeled data makes it hard to train models from scratch in medical
domain, and an important paradigm is pre-training and then fine-tuning. Large
pre-trained models contain rich representations, which can be adapted to
downstream medical tasks. However, existing methods either tune all the
parameters or the task-specific layers of the pre-trained models, ignoring the
input variations of medical images, and thus they are not efficient or
effective. In this work, we aim to study parameter-efficient fine-tuning (PEFT)
for medical image analysis, and propose a dynamic visual prompt tuning method,
named DVPT. It can extract knowledge beneficial to downstream tasks from large
models with a few trainable parameters. Firstly, the frozen features are
transformed by an lightweight bottleneck layer to learn the domain-specific
distribution of downstream medical tasks, and then a few learnable visual
prompts are used as dynamic queries and then conduct cross-attention with the
transformed features, attempting to acquire sample-specific knowledge that are
suitable for each sample. Finally, the features are projected to original
feature dimension and aggregated with the frozen features. This DVPT module can
be shared between different Transformer layers, further reducing the trainable
parameters. To validate DVPT, we conduct extensive experiments with different
pre-trained models on medical classification and segmentation tasks. We find
such PEFT method can not only efficiently adapt the pre-trained models to the
medical domain, but also brings data efficiency with partial labeled data. For
example, with 0.5\% extra trainable parameters, our method not only outperforms
state-of-the-art PEFT methods, even surpasses the full fine-tuning by more than
2.20\% Kappa score on medical classification task. It can saves up to 60\%
labeled data and 99\% storage cost of ViT-B/16.
Related papers
- Visual Fourier Prompt Tuning [63.66866445034855]
We propose the Visual Fourier Prompt Tuning (VFPT) method as a general and effective solution for adapting large-scale transformer-based models.
Our approach incorporates the Fast Fourier Transform into prompt embeddings and harmoniously considers both spatial and frequency domain information.
Our results demonstrate that our approach outperforms current state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2024-11-02T18:18:35Z) - FPT+: A Parameter and Memory Efficient Transfer Learning Method for High-resolution Medical Image Classification [1.5791081894226173]
Fine-grained Prompt Tuning plus (FPT+) is a PETL method designed for high-resolution medical image classification.
FPT+ performs transfer learning by training a lightweight side network and accessing pre-trained knowledge from a large pre-trained model.
Experimental results demonstrate that FPT+ outperforms other PETL methods, using only 1.03% of the learnable parameters and 3.18% of the memory required for fine-tuning an entire ViT-B model.
arXiv Detail & Related papers (2024-08-05T12:33:07Z) - Probing the Efficacy of Federated Parameter-Efficient Fine-Tuning of Vision Transformers for Medical Image Classification [16.070261684997362]
Fine-tuning pre-trained models for various downstream tasks is a critical problem in the medical imaging domain.
Large size of these models necessitates the use of parameter-efficient fine-tuning (PEFT) to reduce the communication burden in federated learning.
In this work, we investigate various federated PEFT strategies for adapting a Vision Transformer (ViT) model for medical image classification.
arXiv Detail & Related papers (2024-07-16T10:28:50Z) - Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach [87.8330887605381]
We show how to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters.
We synthesize a task-specific query with a learnable and lightweight module, which is independent of the pre-trained model.
Our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
arXiv Detail & Related papers (2024-07-09T15:45:04Z) - Embedded Prompt Tuning: Towards Enhanced Calibration of Pretrained Models for Medical Images [18.094731760514264]
We study the effectiveness of fine-tuning methods when adapting foundation models to medical image classification tasks.
We propose the Embedded Prompt Tuning (EPT) method by embedding prompt tokens into the expanded channels.
EPT outperforms several state-of-the-art finetuning methods by a significant margin on few-shot medical image classification tasks.
arXiv Detail & Related papers (2024-07-01T06:35:53Z) - MeLo: Low-rank Adaptation is Better than Fine-tuning for Medical Image Diagnosis [63.59184480010552]
Vision Transformers (ViT) have become much larger and less accessible to medical imaging communities.
MeLo (Medical image Low-rank adaptation) adopts low-rank adaptation instead of resource-demanding fine-tuning.
Our proposed method achieves comparable performance to fully fine-tuned ViT models on four distinct medical imaging datasets.
arXiv Detail & Related papers (2023-11-14T15:18:54Z) - Prototype-based HyperAdapter for Sample-Efficient Multi-task Tuning [30.251155072822055]
Prototype-based HyperAdapter (PHA) is a novel framework built on the adapter-tuning and hypernetwork.
It introduces an instance-dense retriever and prototypical hypernetwork to generate conditional modules in a sample-efficient manner.
We show that PHA strikes a better trade-off between trainable parameters, accuracy on stream tasks, and sample efficiency.
arXiv Detail & Related papers (2023-10-18T02:42:17Z) - Med-Tuning: A New Parameter-Efficient Tuning Framework for Medical Volumetric Segmentation [37.42382366505377]
We introduce a new framework named Med-Tuning to realize parameter-efficient tuning (PET) for medical volumetric segmentation task.
Our framework enhances the 2D baselines's precision on segmentation tasks, which are pre-trained on natural images.
Compared to full FT, Med-Tuning reduces the fine-tuned model parameters by up to 4x, with even better segmentation performance.
arXiv Detail & Related papers (2023-04-21T10:47:13Z) - Strong Baselines for Parameter Efficient Few-Shot Fine-tuning [50.83426196335385]
Few-shot classification (FSC) entails learning novel classes given only a few examples per class after a pre-training (or meta-training) phase.
Recent works have shown that simply fine-tuning a pre-trained Vision Transformer (ViT) on new test classes is a strong approach for FSC.
Fine-tuning ViTs, however, is expensive in time, compute and storage.
This has motivated the design of parameter efficient fine-tuning (PEFT) methods which fine-tune only a fraction of the Transformer's parameters.
arXiv Detail & Related papers (2023-04-04T16:14:39Z) - Prompt Tuning for Parameter-efficient Medical Image Segmentation [79.09285179181225]
We propose and investigate several contributions to achieve a parameter-efficient but effective adaptation for semantic segmentation on two medical imaging datasets.
We pre-train this architecture with a dedicated dense self-supervision scheme based on assignments to online generated prototypes.
We demonstrate that the resulting neural network model is able to attenuate the gap between fully fine-tuned and parameter-efficiently adapted models.
arXiv Detail & Related papers (2022-11-16T21:55:05Z) - Visual Prompt Tuning [74.5309408185523]
This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision.
VPT introduces only a small amount (less than 1% of model parameters) of trainable parameters in the input space while keeping the model backbone frozen.
arXiv Detail & Related papers (2022-03-23T01:17:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.