Mirror U-Net: Marrying Multimodal Fission with Multi-task Learning for
Semantic Segmentation in Medical Imaging
- URL: http://arxiv.org/abs/2303.07126v1
- Date: Mon, 13 Mar 2023 13:57:29 GMT
- Title: Mirror U-Net: Marrying Multimodal Fission with Multi-task Learning for
Semantic Segmentation in Medical Imaging
- Authors: Zdravko Marinov, Simon Rei{\ss}, David Kersting, Jens Kleesiek, Rainer
Stiefelhagen
- Abstract summary: We propose Mirror U-Net, which replaces traditional fusion methods with multimodal fission.
Mirror U-Net assigns a task tailored to each modality to reinforce unimodal features while preserving multimodal features in the shared representation.
We evaluate Mirror U-Net on the AutoPET PET/CT and on the multimodal MSD BrainTumor datasets, demonstrating its effectiveness in multimodal segmentation and achieving state-of-the-art performance on both datasets.
- Score: 19.011295977183835
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Positron Emission Tomography (PET) and Computer Tomography (CT) are routinely
used together to detect tumors. PET/CT segmentation models can automate tumor
delineation, however, current multimodal models do not fully exploit the
complementary information in each modality, as they either concatenate PET and
CT data or fuse them at the decision level. To combat this, we propose Mirror
U-Net, which replaces traditional fusion methods with multimodal fission by
factorizing the multimodal representation into modality-specific branches and
an auxiliary multimodal decoder. At these branches, Mirror U-Net assigns a task
tailored to each modality to reinforce unimodal features while preserving
multimodal features in the shared representation. In contrast to previous
methods that use either fission or multi-task learning, Mirror U-Net combines
both paradigms in a unified framework. We explore various task combinations and
examine which parameters to share in the model. We evaluate Mirror U-Net on the
AutoPET PET/CT and on the multimodal MSD BrainTumor datasets, demonstrating its
effectiveness in multimodal segmentation and achieving state-of-the-art
performance on both datasets. Our code will be made publicly available.
Related papers
- MIND: Modality-Informed Knowledge Distillation Framework for Multimodal Clinical Prediction Tasks [50.98856172702256]
We propose the Modality-INformed knowledge Distillation (MIND) framework, a multimodal model compression approach.
MIND transfers knowledge from ensembles of pre-trained deep neural networks of varying sizes into a smaller multimodal student.
We evaluate MIND on binary and multilabel clinical prediction tasks using time series data and chest X-ray images.
arXiv Detail & Related papers (2025-02-03T08:50:00Z) - Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation [97.82707398481273]
We develop a novel meta-learning-based multimodal fusion framework called Meta Multimodal Fusion (MetaMMF)
Based on the meta information extracted from the multimodal features of the input task, MetaMMF parameterizes a neural network as the item-specific fusion function via a meta learner.
We perform extensive experiments on three benchmark datasets, demonstrating the significant improvements over several state-of-the-art multimodal recommendation models.
arXiv Detail & Related papers (2025-01-13T07:51:43Z) - Diff4MMLiTS: Advanced Multimodal Liver Tumor Segmentation via Diffusion-Based Image Synthesis and Alignment [3.700932355945534]
Multimodal learning has been demonstrated to enhance performance across various clinical tasks.
We introduce Diff4MMLiTS, a four-stage multimodal liver tumor segmentation pipeline.
Experiments on public and internal datasets demonstrate the superiority of Diff4MMLiTS over other state-of-the-art multimodal segmentation methods.
arXiv Detail & Related papers (2024-12-29T09:55:00Z) - U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation [63.31007867379312]
We introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantics.
We employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features.
Experimental results demonstrate that our approach achieves superior performance across multiple datasets.
arXiv Detail & Related papers (2024-05-24T08:58:48Z) - Multimodal Information Interaction for Medical Image Segmentation [24.024848382458767]
We introduce an innovative Multimodal Information Cross Transformer (MicFormer)
It queries features from one modality and retrieves corresponding responses from another, facilitating effective communication between bimodal features.
Compared to other multimodal segmentation techniques, our method outperforms by margins of 2.83 and 4.23, respectively.
arXiv Detail & Related papers (2024-04-25T07:21:14Z) - Towards Transferable Multi-modal Perception Representation Learning for
Autonomy: NeRF-Supervised Masked AutoEncoder [1.90365714903665]
This work proposes a unified self-supervised pre-training framework for transferable multi-modal perception representation learning.
We show that the representation learned via NeRF-Supervised Masked AutoEncoder (NS-MAE) shows promising transferability for diverse multi-modal and single-modal (camera-only and Lidar-only) perception models.
We hope this study can inspire exploration of more general multi-modal representation learning for autonomous agents.
arXiv Detail & Related papers (2023-11-23T00:53:11Z) - Multi-scale Cooperative Multimodal Transformers for Multimodal Sentiment
Analysis in Videos [58.93586436289648]
We propose a multi-scale cooperative multimodal transformer (MCMulT) architecture for multimodal sentiment analysis.
Our model outperforms existing approaches on unaligned multimodal sequences and has strong performance on aligned multimodal sequences.
arXiv Detail & Related papers (2022-06-16T07:47:57Z) - A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine
Translation [131.33610549540043]
We propose a novel graph-based multi-modal fusion encoder for NMT.
We first represent the input sentence and image using a unified multi-modal graph.
We then stack multiple graph-based multi-modal fusion layers that iteratively perform semantic interactions to learn node representations.
arXiv Detail & Related papers (2020-07-17T04:06:09Z) - Unpaired Multi-modal Segmentation via Knowledge Distillation [77.39798870702174]
We propose a novel learning scheme for unpaired cross-modality image segmentation.
In our method, we heavily reuse network parameters, by sharing all convolutional kernels across CT and MRI.
We have extensively validated our approach on two multi-class segmentation problems.
arXiv Detail & Related papers (2020-01-06T20:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.