PEMMA: Parameter-Efficient Multi-Modal Adaptation for Medical Image Segmentation
- URL: http://arxiv.org/abs/2404.13704v1
- Date: Sun, 21 Apr 2024 16:29:49 GMT
- Title: PEMMA: Parameter-Efficient Multi-Modal Adaptation for Medical Image Segmentation
- Authors: Nada Saadi, Numan Saeed, Mohammad Yaqub, Karthik Nandakumar,
- Abstract summary: When both CT and PET scans are available, it is common to combine them as two channels of the input to the segmentation model.
This method requires both scan types during training and inference, posing a challenge due to the limited availability of PET scans.
We propose a parameter-efficient multi-modal adaptation framework for lightweight upgrading of a transformer-based segmentation model.
- Score: 5.056996354878645
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Imaging modalities such as Computed Tomography (CT) and Positron Emission Tomography (PET) are key in cancer detection, inspiring Deep Neural Networks (DNN) models that merge these scans for tumor segmentation. When both CT and PET scans are available, it is common to combine them as two channels of the input to the segmentation model. However, this method requires both scan types during training and inference, posing a challenge due to the limited availability of PET scans, thereby sometimes limiting the process to CT scans only. Hence, there is a need to develop a flexible DNN architecture that can be trained/updated using only CT scans but can effectively utilize PET scans when they become available. In this work, we propose a parameter-efficient multi-modal adaptation (PEMMA) framework for lightweight upgrading of a transformer-based segmentation model trained only on CT scans to also incorporate PET scans. The benefits of the proposed approach are two-fold. Firstly, we leverage the inherent modularity of the transformer architecture and perform low-rank adaptation (LoRA) of the attention weights to achieve parameter-efficient adaptation. Secondly, since the PEMMA framework attempts to minimize cross modal entanglement, it is possible to subsequently update the combined model using only one modality, without causing catastrophic forgetting of the other modality. Our proposed method achieves comparable results with the performance of early fusion techniques with just 8% of the trainable parameters, especially with a remarkable +28% improvement on the average dice score on PET scans when trained on a single modality.
Related papers
- Parameter Efficient Fine Tuning for Multi-scanner PET to PET Reconstruction [3.74142789780782]
Motivated by the potential of.
Efficient Fine-Tuning (PEFT), we aim to address issues by effectively leveraging PEFT to improve limited data.
In this paper, we introduce PETITE,.
Efficient Fine-Tuning for MultI-scanner PET to PET REconstruction that uses fewer than 1% of the parameters.
arXiv Detail & Related papers (2024-07-10T10:12:26Z) - Revolutionizing Disease Diagnosis with simultaneous functional PET/MR and Deeply Integrated Brain Metabolic, Hemodynamic, and Perfusion Networks [40.986069119392944]
We propose MX-ARM, a multimodal MiXture-of-experts Alignment Reconstruction and Model.
It is modality detachable and exchangeable, allocating different multi-layer perceptrons dynamically ("mixture of experts") through learnable weights to learn respective representations from different modalities.
arXiv Detail & Related papers (2024-03-29T08:47:49Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - Contrastive Diffusion Model with Auxiliary Guidance for Coarse-to-Fine
PET Reconstruction [62.29541106695824]
This paper presents a coarse-to-fine PET reconstruction framework that consists of a coarse prediction module (CPM) and an iterative refinement module (IRM)
By delegating most of the computational overhead to the CPM, the overall sampling speed of our method can be significantly improved.
Two additional strategies, i.e., an auxiliary guidance strategy and a contrastive diffusion strategy, are proposed and integrated into the reconstruction process.
arXiv Detail & Related papers (2023-08-20T04:10:36Z) - 3DSAM-adapter: Holistic adaptation of SAM from 2D to 3D for promptable tumor segmentation [52.699139151447945]
We propose a novel adaptation method for transferring the segment anything model (SAM) from 2D to 3D for promptable medical image segmentation.
Our model can outperform domain state-of-the-art medical image segmentation models on 3 out of 4 tasks, specifically by 8.25%, 29.87%, and 10.11% for kidney tumor, pancreas tumor, colon cancer segmentation, and achieve similar performance for liver tumor segmentation.
arXiv Detail & Related papers (2023-06-23T12:09:52Z) - SwinCross: Cross-modal Swin Transformer for Head-and-Neck Tumor
Segmentation in PET/CT Images [6.936329289469511]
Cross-Modal Swin Transformer (SwinCross) with cross-modal attention (CMA) module incorporated cross-modal feature extraction at multiple resolutions.
The proposed method is experimentally shown to outperform state-of-the-art transformer-based methods.
arXiv Detail & Related papers (2023-02-08T03:36:57Z) - Spatiotemporal Feature Learning Based on Two-Step LSTM and Transformer
for CT Scans [2.3682456328966115]
We propose a novel, effective, two-step-wise approach to tickle this issue for COVID-19 symptom classification thoroughly.
First, the semantic feature embedding of each slice for a CT scan is extracted by conventional backbone networks.
Then, we proposed a long short-term memory (LSTM) and Transformer-based sub-network to deal with temporal feature learning.
arXiv Detail & Related papers (2022-07-04T16:59:05Z) - Modality Completion via Gaussian Process Prior Variational Autoencoders
for Multi-Modal Glioma Segmentation [75.58395328700821]
We propose a novel model, Multi-modal Gaussian Process Prior Variational Autoencoder (MGP-VAE), to impute one or more missing sub-modalities for a patient scan.
MGP-VAE can leverage the Gaussian Process (GP) prior on the Variational Autoencoder (VAE) to utilize the subjects/patients and sub-modalities correlations.
We show the applicability of MGP-VAE on brain tumor segmentation where either, two, or three of four sub-modalities may be missing.
arXiv Detail & Related papers (2021-07-07T19:06:34Z) - A Multi-Stage Attentive Transfer Learning Framework for Improving
COVID-19 Diagnosis [49.3704402041314]
We propose a multi-stage attentive transfer learning framework for improving COVID-19 diagnosis.
Our proposed framework consists of three stages to train accurate diagnosis models through learning knowledge from multiple source tasks and data of different domains.
Importantly, we propose a novel self-supervised learning method to learn multi-scale representations for lung CT images.
arXiv Detail & Related papers (2021-01-14T01:39:19Z) - Multimodal Spatial Attention Module for Targeting Multimodal PET-CT Lung
Tumor Segmentation [11.622615048002567]
Multimodal spatial attention module (MSAM) learns to emphasize regions related to tumors.
MSAM can be applied to common backbone architectures and trained end-to-end.
arXiv Detail & Related papers (2020-07-29T10:27:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.