Mirror U-Net: Marrying Multimodal Fission with Multi-task Learning for
Semantic Segmentation in Medical Imaging
- URL: http://arxiv.org/abs/2303.07126v1
- Date: Mon, 13 Mar 2023 13:57:29 GMT
- Title: Mirror U-Net: Marrying Multimodal Fission with Multi-task Learning for
Semantic Segmentation in Medical Imaging
- Authors: Zdravko Marinov, Simon Rei{\ss}, David Kersting, Jens Kleesiek, Rainer
Stiefelhagen
- Abstract summary: We propose Mirror U-Net, which replaces traditional fusion methods with multimodal fission.
Mirror U-Net assigns a task tailored to each modality to reinforce unimodal features while preserving multimodal features in the shared representation.
We evaluate Mirror U-Net on the AutoPET PET/CT and on the multimodal MSD BrainTumor datasets, demonstrating its effectiveness in multimodal segmentation and achieving state-of-the-art performance on both datasets.
- Score: 19.011295977183835
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Positron Emission Tomography (PET) and Computer Tomography (CT) are routinely
used together to detect tumors. PET/CT segmentation models can automate tumor
delineation, however, current multimodal models do not fully exploit the
complementary information in each modality, as they either concatenate PET and
CT data or fuse them at the decision level. To combat this, we propose Mirror
U-Net, which replaces traditional fusion methods with multimodal fission by
factorizing the multimodal representation into modality-specific branches and
an auxiliary multimodal decoder. At these branches, Mirror U-Net assigns a task
tailored to each modality to reinforce unimodal features while preserving
multimodal features in the shared representation. In contrast to previous
methods that use either fission or multi-task learning, Mirror U-Net combines
both paradigms in a unified framework. We explore various task combinations and
examine which parameters to share in the model. We evaluate Mirror U-Net on the
AutoPET PET/CT and on the multimodal MSD BrainTumor datasets, demonstrating its
effectiveness in multimodal segmentation and achieving state-of-the-art
performance on both datasets. Our code will be made publicly available.
Related papers
- U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation [63.31007867379312]
We introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantics.
We employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features.
Experimental results demonstrate that our approach achieves superior performance across multiple datasets.
arXiv Detail & Related papers (2024-05-24T08:58:48Z) - Multimodal Information Interaction for Medical Image Segmentation [24.024848382458767]
We introduce an innovative Multimodal Information Cross Transformer (MicFormer)
It queries features from one modality and retrieves corresponding responses from another, facilitating effective communication between bimodal features.
Compared to other multimodal segmentation techniques, our method outperforms by margins of 2.83 and 4.23, respectively.
arXiv Detail & Related papers (2024-04-25T07:21:14Z) - Enhancing CT Image synthesis from multi-modal MRI data based on a
multi-task neural network framework [16.864720020158906]
We propose a versatile multi-task neural network framework, based on an enhanced Transformer U-Net architecture.
We decompose the traditional problem of synthesizing CT images into distinct subtasks.
To enhance the framework's versatility in handling multi-modal data, we expand the model with multiple image channels.
arXiv Detail & Related papers (2023-12-13T18:22:38Z) - Towards Transferable Multi-modal Perception Representation Learning for
Autonomy: NeRF-Supervised Masked AutoEncoder [1.90365714903665]
This work proposes a unified self-supervised pre-training framework for transferable multi-modal perception representation learning.
We show that the representation learned via NeRF-Supervised Masked AutoEncoder (NS-MAE) shows promising transferability for diverse multi-modal and single-modal (camera-only and Lidar-only) perception models.
We hope this study can inspire exploration of more general multi-modal representation learning for autonomous agents.
arXiv Detail & Related papers (2023-11-23T00:53:11Z) - Multi-scale Transformer Network with Edge-aware Pre-training for
Cross-Modality MR Image Synthesis [52.41439725865149]
Cross-modality magnetic resonance (MR) image synthesis can be used to generate missing modalities from given ones.
Existing (supervised learning) methods often require a large number of paired multi-modal data to train an effective synthesis model.
We propose a Multi-scale Transformer Network (MT-Net) with edge-aware pre-training for cross-modality MR image synthesis.
arXiv Detail & Related papers (2022-12-02T11:40:40Z) - NestedFormer: Nested Modality-Aware Transformer for Brain Tumor
Segmentation [29.157465321864265]
We propose a novel Nested Modality-Aware Transformer (NestedFormer) to explore the intra-modality and inter-modality relationships of multi-modal MRIs for brain tumor segmentation.
Built on the transformer-based multi-encoder and single-decoder structure, we perform nested multi-modal fusion for high-level representations of different modalities.
arXiv Detail & Related papers (2022-08-31T14:04:25Z) - Multi-scale Cooperative Multimodal Transformers for Multimodal Sentiment
Analysis in Videos [58.93586436289648]
We propose a multi-scale cooperative multimodal transformer (MCMulT) architecture for multimodal sentiment analysis.
Our model outperforms existing approaches on unaligned multimodal sequences and has strong performance on aligned multimodal sequences.
arXiv Detail & Related papers (2022-06-16T07:47:57Z) - MulT: An End-to-End Multitask Learning Transformer [66.52419626048115]
We propose an end-to-end Multitask Learning Transformer framework, named MulT, to simultaneously learn multiple high-level vision tasks.
Our framework encodes the input image into a shared representation and makes predictions for each vision task using task-specific transformer-based decoder heads.
arXiv Detail & Related papers (2022-05-17T13:03:18Z) - A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine
Translation [131.33610549540043]
We propose a novel graph-based multi-modal fusion encoder for NMT.
We first represent the input sentence and image using a unified multi-modal graph.
We then stack multiple graph-based multi-modal fusion layers that iteratively perform semantic interactions to learn node representations.
arXiv Detail & Related papers (2020-07-17T04:06:09Z) - Unpaired Multi-modal Segmentation via Knowledge Distillation [77.39798870702174]
We propose a novel learning scheme for unpaired cross-modality image segmentation.
In our method, we heavily reuse network parameters, by sharing all convolutional kernels across CT and MRI.
We have extensively validated our approach on two multi-class segmentation problems.
arXiv Detail & Related papers (2020-01-06T20:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.