Related papers: DVPT: Dynamic Visual Prompt Tuning of Large Pre-trained Models for Medical Image Analysis

DVPT: Dynamic Visual Prompt Tuning of Large Pre-trained Models for Medical Image Analysis

URL: http://arxiv.org/abs/2307.09787v1
Date: Wed, 19 Jul 2023 07:11:11 GMT
Title: DVPT: Dynamic Visual Prompt Tuning of Large Pre-trained Models for Medical Image Analysis
Authors: Along He, Kai Wang, Zhihong Wang, Tao Li, and Huazhu Fu
Abstract summary: We propose a dynamic visual prompt tuning method, named DVPT, for medical image analysis. It can extract knowledge beneficial to downstream tasks from large models with a few trainable parameters. It can save up to 60% labeled data and 99% storage cost of ViT-B/16.
Score: 30.608225734194416
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Limited labeled data makes it hard to train models from scratch in medical domain, and an important paradigm is pre-training and then fine-tuning. Large pre-trained models contain rich representations, which can be adapted to downstream medical tasks. However, existing methods either tune all the parameters or the task-specific layers of the pre-trained models, ignoring the input variations of medical images, and thus they are not efficient or effective. In this work, we aim to study parameter-efficient fine-tuning (PEFT) for medical image analysis, and propose a dynamic visual prompt tuning method, named DVPT. It can extract knowledge beneficial to downstream tasks from large models with a few trainable parameters. Firstly, the frozen features are transformed by an lightweight bottleneck layer to learn the domain-specific distribution of downstream medical tasks, and then a few learnable visual prompts are used as dynamic queries and then conduct cross-attention with the transformed features, attempting to acquire sample-specific knowledge that are suitable for each sample. Finally, the features are projected to original feature dimension and aggregated with the frozen features. This DVPT module can be shared between different Transformer layers, further reducing the trainable parameters. To validate DVPT, we conduct extensive experiments with different pre-trained models on medical classification and segmentation tasks. We find such PEFT method can not only efficiently adapt the pre-trained models to the medical domain, but also brings data efficiency with partial labeled data. For example, with 0.5\% extra trainable parameters, our method not only outperforms state-of-the-art PEFT methods, even surpasses the full fine-tuning by more than 2.20\% Kappa score on medical classification task. It can saves up to 60\% labeled data and 99\% storage cost of ViT-B/16.

Related papers

Task-Specific Knowledge Distillation from the Vision Foundation Model for Enhanced Medical Image Segmentation [13.018234326432964]
We propose a novel and generalizable task-specific knowledge distillation framework for medical image segmentation. Our method fine-tunes the VFM on the target segmentation task to capture task-specific features before distilling the knowledge to smaller models. Experimental results across five medical image datasets demonstrate that our method consistently outperforms task-agnostic knowledge distillation.
arXiv Detail & Related papers (2025-03-10T06:39:53Z)
Visual Fourier Prompt Tuning [63.66866445034855]
We propose the Visual Fourier Prompt Tuning (VFPT) method as a general and effective solution for adapting large-scale transformer-based models. Our approach incorporates the Fast Fourier Transform into prompt embeddings and harmoniously considers both spatial and frequency domain information. Our results demonstrate that our approach outperforms current state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2024-11-02T18:18:35Z)
FPT+: A Parameter and Memory Efficient Transfer Learning Method for High-resolution Medical Image Classification [1.5791081894226173]
Fine-grained Prompt Tuning plus (FPT+) is a PETL method designed for high-resolution medical image classification. FPT+ performs transfer learning by training a lightweight side network and accessing pre-trained knowledge from a large pre-trained model. Experimental results demonstrate that FPT+ outperforms other PETL methods, using only 1.03% of the learnable parameters and 3.18% of the memory required for fine-tuning an entire ViT-B model.
arXiv Detail & Related papers (2024-08-05T12:33:07Z)
Probing the Efficacy of Federated Parameter-Efficient Fine-Tuning of Vision Transformers for Medical Image Classification [16.070261684997362]
Fine-tuning pre-trained models for various downstream tasks is a critical problem in the medical imaging domain. Large size of these models necessitates the use of parameter-efficient fine-tuning (PEFT) to reduce the communication burden in federated learning. In this work, we investigate various federated PEFT strategies for adapting a Vision Transformer (ViT) model for medical image classification.
arXiv Detail & Related papers (2024-07-16T10:28:50Z)
Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach [87.8330887605381]
We show how to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters. We synthesize a task-specific query with a learnable and lightweight module, which is independent of the pre-trained model. Our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
arXiv Detail & Related papers (2024-07-09T15:45:04Z)
Embedded Prompt Tuning: Towards Enhanced Calibration of Pretrained Models for Medical Images [18.094731760514264]
We study the effectiveness of fine-tuning methods when adapting foundation models to medical image classification tasks. We propose the Embedded Prompt Tuning (EPT) method by embedding prompt tokens into the expanded channels. EPT outperforms several state-of-the-art finetuning methods by a significant margin on few-shot medical image classification tasks.
arXiv Detail & Related papers (2024-07-01T06:35:53Z)
MeLo: Low-rank Adaptation is Better than Fine-tuning for Medical Image Diagnosis [63.59184480010552]
Vision Transformers (ViT) have become much larger and less accessible to medical imaging communities. MeLo (Medical image Low-rank adaptation) adopts low-rank adaptation instead of resource-demanding fine-tuning. Our proposed method achieves comparable performance to fully fine-tuned ViT models on four distinct medical imaging datasets.
arXiv Detail & Related papers (2023-11-14T15:18:54Z)
Prototype-based HyperAdapter for Sample-Efficient Multi-task Tuning [30.251155072822055]
Prototype-based HyperAdapter (PHA) is a novel framework built on the adapter-tuning and hypernetwork. It introduces an instance-dense retriever and prototypical hypernetwork to generate conditional modules in a sample-efficient manner. We show that PHA strikes a better trade-off between trainable parameters, accuracy on stream tasks, and sample efficiency.
arXiv Detail & Related papers (2023-10-18T02:42:17Z)
Med-Tuning: A New Parameter-Efficient Tuning Framework for Medical Volumetric Segmentation [37.42382366505377]
We introduce a new framework named Med-Tuning to realize parameter-efficient tuning (PET) for medical volumetric segmentation task. Our framework enhances the 2D baselines's precision on segmentation tasks, which are pre-trained on natural images. Compared to full FT, Med-Tuning reduces the fine-tuned model parameters by up to 4x, with even better segmentation performance.
arXiv Detail & Related papers (2023-04-21T10:47:13Z)
Strong Baselines for Parameter Efficient Few-Shot Fine-tuning [50.83426196335385]
Few-shot classification (FSC) entails learning novel classes given only a few examples per class after a pre-training (or meta-training) phase. Recent works have shown that simply fine-tuning a pre-trained Vision Transformer (ViT) on new test classes is a strong approach for FSC. Fine-tuning ViTs, however, is expensive in time, compute and storage. This has motivated the design of parameter efficient fine-tuning (PEFT) methods which fine-tune only a fraction of the Transformer's parameters.
arXiv Detail & Related papers (2023-04-04T16:14:39Z)
Prompt Tuning for Parameter-efficient Medical Image Segmentation [79.09285179181225]
We propose and investigate several contributions to achieve a parameter-efficient but effective adaptation for semantic segmentation on two medical imaging datasets. We pre-train this architecture with a dedicated dense self-supervision scheme based on assignments to online generated prototypes. We demonstrate that the resulting neural network model is able to attenuate the gap between fully fine-tuned and parameter-efficiently adapted models.
arXiv Detail & Related papers (2022-11-16T21:55:05Z)
Visual Prompt Tuning [74.5309408185523]
This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision. VPT introduces only a small amount (less than 1% of model parameters) of trainable parameters in the input space while keeping the model backbone frozen.
arXiv Detail & Related papers (2022-03-23T01:17:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.