TransFusion: Multi-view Divergent Fusion for Medical Image Segmentation
with Transformers
- URL: http://arxiv.org/abs/2203.10726v1
- Date: Mon, 21 Mar 2022 04:02:54 GMT
- Title: TransFusion: Multi-view Divergent Fusion for Medical Image Segmentation
with Transformers
- Authors: Di Liu, Yunhe Gao, Qilong Zhangli, Zhennan Yan, Mu Zhou and Dimitris
Metaxas
- Abstract summary: We present TransFusion, a Transformer-based architecture to merge divergent multi-view imaging information using convolutional layers and powerful attention mechanisms.
In particular, the Divergent Fusion Attention (DiFA) module is proposed for rich cross-view context modeling and semantic dependency mining.
- Score: 8.139069987207494
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Combining information from multi-view images is crucial to improve the
performance and robustness of automated methods for disease diagnosis. However,
due to the non-alignment characteristics of multi-view images, building
correlation and data fusion across views largely remain an open problem. In
this study, we present TransFusion, a Transformer-based architecture to merge
divergent multi-view imaging information using convolutional layers and
powerful attention mechanisms. In particular, the Divergent Fusion Attention
(DiFA) module is proposed for rich cross-view context modeling and semantic
dependency mining, addressing the critical issue of capturing long-range
correlations between unaligned data from different image views. We further
propose the Multi-Scale Attention (MSA) to collect global correspondence of
multi-scale feature representations. We evaluate TransFusion on the
Multi-Disease, Multi-View \& Multi-Center Right Ventricular Segmentation in
Cardiac MRI (M\&Ms-2) challenge cohort. TransFusion demonstrates leading
performance against the state-of-the-art methods and opens up new perspectives
for multi-view imaging integration towards robust medical image segmentation.
Related papers
- A New Multimodal Medical Image Fusion based on Laplacian Autoencoder
with Channel Attention [3.1531360678320897]
Deep learning models have achieved end-to-end image fusion with highly robust and accurate performance.
Most DL-based fusion models perform down-sampling on the input images to minimize the number of learnable parameters and computations.
We propose a new multimodal medical image fusion model is proposed that is based on integrated Laplacian-Gaussian concatenation with attention pooling.
arXiv Detail & Related papers (2023-10-18T11:29:53Z) - AdaFuse: Adaptive Medical Image Fusion Based on Spatial-Frequential
Cross Attention [6.910879180358217]
We propose AdaFuse, in which multimodal image information is fused adaptively through frequency-guided attention mechanism.
The proposed method outperforms state-of-the-art methods in terms of both visual quality and quantitative metrics.
arXiv Detail & Related papers (2023-10-09T07:10:30Z) - C^2M-DoT: Cross-modal consistent multi-view medical report generation
with domain transfer network [67.97926983664676]
We propose a cross-modal consistent multi-view medical report generation with a domain transfer network (C2M-DoT)
C2M-DoT substantially outperforms state-of-the-art baselines in all metrics.
arXiv Detail & Related papers (2023-10-09T02:31:36Z) - Unified Frequency-Assisted Transformer Framework for Detecting and
Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem.
By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts.
Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z) - Multi-Spectral Image Stitching via Spatial Graph Reasoning [52.27796682972484]
We propose a spatial graph reasoning based multi-spectral image stitching method.
We embed multi-scale complementary features from the same view position into a set of nodes.
By introducing long-range coherence along spatial and channel dimensions, the complementarity of pixel relations and channel interdependencies aids in the reconstruction of aligned multi-view features.
arXiv Detail & Related papers (2023-07-31T15:04:52Z) - Multi-task Paired Masking with Alignment Modeling for Medical
Vision-Language Pre-training [55.56609500764344]
We propose a unified framework based on Multi-task Paired Masking with Alignment (MPMA) to integrate the cross-modal alignment task into the joint image-text reconstruction framework.
We also introduce a Memory-Augmented Cross-Modal Fusion (MA-CMF) module to fully integrate visual information to assist report reconstruction.
arXiv Detail & Related papers (2023-05-13T13:53:48Z) - CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for
Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network.
We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z) - TFormer: A throughout fusion transformer for multi-modal skin lesion
diagnosis [6.899641625551976]
We introduce a pure transformer-based method, which we refer to as Throughout Fusion Transformer (TFormer)", for sufficient information intergration in MSLD.
We then carefully design a stack of dual-branch hierarchical multi-modal transformer (HMT) blocks to fuse information across different image modalities in a stage-by-stage way.
Our TFormer outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2022-11-21T12:07:05Z) - TranSiam: Fusing Multimodal Visual Features Using Transformer for
Medical Image Segmentation [4.777011444412729]
We propose a segmentation method suitable for multimodal medical images that can capture global information.
TranSiam is a 2D dual path network that extracts features of different modalities.
On the BraTS 2019 and BraTS 2020 multimodal datasets, we have a significant improvement in accuracy over other popular methods.
arXiv Detail & Related papers (2022-04-26T09:39:10Z) - TransAttUnet: Multi-level Attention-guided U-Net with Transformer for
Medical Image Segmentation [33.45471457058221]
This paper proposes a novel Transformer based medical image semantic segmentation framework called TransAttUnet.
In particular, we establish additional multi-scale skip connections between decoder blocks to aggregate the different semantic-scale upsampling features.
Our method consistently outperforms the state-of-the-art baselines.
arXiv Detail & Related papers (2021-07-12T09:17:06Z) - Robust Multimodal Brain Tumor Segmentation via Feature Disentanglement
and Gated Fusion [71.87627318863612]
We propose a novel multimodal segmentation framework which is robust to the absence of imaging modalities.
Our network uses feature disentanglement to decompose the input modalities into the modality-specific appearance code.
We validate our method on the important yet challenging multimodal brain tumor segmentation task with the BRATS challenge dataset.
arXiv Detail & Related papers (2020-02-22T14:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.