TransFusion: Multi-view Divergent Fusion for Medical Image Segmentation
with Transformers
- URL: http://arxiv.org/abs/2203.10726v1
- Date: Mon, 21 Mar 2022 04:02:54 GMT
- Title: TransFusion: Multi-view Divergent Fusion for Medical Image Segmentation
with Transformers
- Authors: Di Liu, Yunhe Gao, Qilong Zhangli, Zhennan Yan, Mu Zhou and Dimitris
Metaxas
- Abstract summary: We present TransFusion, a Transformer-based architecture to merge divergent multi-view imaging information using convolutional layers and powerful attention mechanisms.
In particular, the Divergent Fusion Attention (DiFA) module is proposed for rich cross-view context modeling and semantic dependency mining.
- Score: 8.139069987207494
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Combining information from multi-view images is crucial to improve the
performance and robustness of automated methods for disease diagnosis. However,
due to the non-alignment characteristics of multi-view images, building
correlation and data fusion across views largely remain an open problem. In
this study, we present TransFusion, a Transformer-based architecture to merge
divergent multi-view imaging information using convolutional layers and
powerful attention mechanisms. In particular, the Divergent Fusion Attention
(DiFA) module is proposed for rich cross-view context modeling and semantic
dependency mining, addressing the critical issue of capturing long-range
correlations between unaligned data from different image views. We further
propose the Multi-Scale Attention (MSA) to collect global correspondence of
multi-scale feature representations. We evaluate TransFusion on the
Multi-Disease, Multi-View \& Multi-Center Right Ventricular Segmentation in
Cardiac MRI (M\&Ms-2) challenge cohort. TransFusion demonstrates leading
performance against the state-of-the-art methods and opens up new perspectives
for multi-view imaging integration towards robust medical image segmentation.
Related papers
- Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond [74.96466744512992]
The essence of image fusion is to integrate complementary information from source images.
DeFusion++ produces versatile fused representations that can enhance the quality of image fusion and the effectiveness of downstream high-level vision tasks.
arXiv Detail & Related papers (2024-10-16T06:28:49Z) - Lagrange Duality and Compound Multi-Attention Transformer for Semi-Supervised Medical Image Segmentation [27.758157788769253]
We propose a Lagrange Duality Consistency (LDC) Loss, integrated with Boundary-Aware Contrastive Loss, as the overall training objective for semi-supervised learning.
We also introduce CMAformer, a novel network that synergizes the strengths of ResUNet and Transformer.
Overall, our results indicate that CMAformer, combined with the feature fusion framework and the new consistency loss, demonstrates strong complementarity in semi-supervised learning ensembles.
arXiv Detail & Related papers (2024-09-12T06:52:46Z) - Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis.
We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z) - A New Multimodal Medical Image Fusion based on Laplacian Autoencoder
with Channel Attention [3.1531360678320897]
Deep learning models have achieved end-to-end image fusion with highly robust and accurate performance.
Most DL-based fusion models perform down-sampling on the input images to minimize the number of learnable parameters and computations.
We propose a new multimodal medical image fusion model is proposed that is based on integrated Laplacian-Gaussian concatenation with attention pooling.
arXiv Detail & Related papers (2023-10-18T11:29:53Z) - AdaFuse: Adaptive Medical Image Fusion Based on Spatial-Frequential
Cross Attention [6.910879180358217]
We propose AdaFuse, in which multimodal image information is fused adaptively through frequency-guided attention mechanism.
The proposed method outperforms state-of-the-art methods in terms of both visual quality and quantitative metrics.
arXiv Detail & Related papers (2023-10-09T07:10:30Z) - C^2M-DoT: Cross-modal consistent multi-view medical report generation
with domain transfer network [67.97926983664676]
We propose a cross-modal consistent multi-view medical report generation with a domain transfer network (C2M-DoT)
C2M-DoT substantially outperforms state-of-the-art baselines in all metrics.
arXiv Detail & Related papers (2023-10-09T02:31:36Z) - Unified Frequency-Assisted Transformer Framework for Detecting and
Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem.
By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts.
Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z) - Multi-Spectral Image Stitching via Spatial Graph Reasoning [52.27796682972484]
We propose a spatial graph reasoning based multi-spectral image stitching method.
We embed multi-scale complementary features from the same view position into a set of nodes.
By introducing long-range coherence along spatial and channel dimensions, the complementarity of pixel relations and channel interdependencies aids in the reconstruction of aligned multi-view features.
arXiv Detail & Related papers (2023-07-31T15:04:52Z) - Multi-task Paired Masking with Alignment Modeling for Medical
Vision-Language Pre-training [55.56609500764344]
We propose a unified framework based on Multi-task Paired Masking with Alignment (MPMA) to integrate the cross-modal alignment task into the joint image-text reconstruction framework.
We also introduce a Memory-Augmented Cross-Modal Fusion (MA-CMF) module to fully integrate visual information to assist report reconstruction.
arXiv Detail & Related papers (2023-05-13T13:53:48Z) - TFormer: A throughout fusion transformer for multi-modal skin lesion
diagnosis [6.899641625551976]
We introduce a pure transformer-based method, which we refer to as Throughout Fusion Transformer (TFormer)", for sufficient information intergration in MSLD.
We then carefully design a stack of dual-branch hierarchical multi-modal transformer (HMT) blocks to fuse information across different image modalities in a stage-by-stage way.
Our TFormer outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2022-11-21T12:07:05Z) - TransAttUnet: Multi-level Attention-guided U-Net with Transformer for
Medical Image Segmentation [33.45471457058221]
This paper proposes a novel Transformer based medical image semantic segmentation framework called TransAttUnet.
In particular, we establish additional multi-scale skip connections between decoder blocks to aggregate the different semantic-scale upsampling features.
Our method consistently outperforms the state-of-the-art baselines.
arXiv Detail & Related papers (2021-07-12T09:17:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.