FIF-UNet: An Efficient UNet Using Feature Interaction and Fusion for Medical Image Segmentation
- URL: http://arxiv.org/abs/2409.05324v1
- Date: Mon, 9 Sep 2024 04:34:47 GMT
- Title: FIF-UNet: An Efficient UNet Using Feature Interaction and Fusion for Medical Image Segmentation
- Authors: Xiaolin Gou, Chuanlin Liao, Jizhe Zhou, Fengshuo Ye, Yi Lin,
- Abstract summary: A novel U-shaped model, called FIF-UNet, is proposed to address the above issue, including three plug-and-play modules.
Experiments on the Synapse and ACDC datasets demonstrate that the proposed FIF-UNet outperforms existing state-of-the-art methods.
- Score: 5.510679875888542
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nowadays, pre-trained encoders are widely used in medical image segmentation because of their ability to capture complex feature representations. However, the existing models fail to effectively utilize the rich features obtained by the pre-trained encoder, resulting in suboptimal segmentation results. In this work, a novel U-shaped model, called FIF-UNet, is proposed to address the above issue, including three plug-and-play modules. A channel spatial interaction module (CSI) is proposed to obtain informative features by establishing the interaction between encoder stages and corresponding decoder stages. A cascaded conv-SE module (CoSE) is designed to enhance the representation of critical features by adaptively assigning importance weights on different feature channels. A multi-level fusion module (MLF) is proposed to fuse the multi-scale features from the decoder stages, ensuring accurate and robust final segmentation. Comprehensive experiments on the Synapse and ACDC datasets demonstrate that the proposed FIF-UNet outperforms existing state-of-the-art methods, which achieves the highest average DICE of 86.05% and 92.58%, respectively.
Related papers
- Few-Shot Medical Image Segmentation with Large Kernel Attention [5.630842216128902]
We propose a few-shot medical segmentation model that acquire comprehensive feature representation capabilities.
Our model comprises four key modules: a dual-path feature extractor, an attention module, an adaptive prototype prediction module, and a multi-scale prediction fusion module.
The results demonstrate that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-07-27T02:28:30Z) - Local-to-Global Cross-Modal Attention-Aware Fusion for HSI-X Semantic Segmentation [19.461033552684576]
We propose a Local-to-Global Cross-modal Attention-aware Fusion (LoGoCAF) framework for HSI-X classification.
LoGoCAF adopts a pixel-to-pixel two-branch semantic segmentation architecture to learn information from HSI and X modalities.
arXiv Detail & Related papers (2024-06-25T16:12:20Z) - Modality Prompts for Arbitrary Modality Salient Object Detection [57.610000247519196]
This paper delves into the task of arbitrary modality salient object detection (AM SOD)
It aims to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images.
A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD.
arXiv Detail & Related papers (2024-05-06T11:02:02Z) - CFPFormer: Feature-pyramid like Transformer Decoder for Segmentation and Detection [1.837431956557716]
Feature pyramids have been widely adopted in convolutional neural networks (CNNs) and transformers for tasks like medical image segmentation and object detection.
We propose a novel decoder block that integrates feature pyramids and transformers.
Our model achieves superior performance in detecting small objects compared to existing methods.
arXiv Detail & Related papers (2024-04-23T18:46:07Z) - MOSformer: Momentum encoder-based inter-slice fusion transformer for
medical image segmentation [15.94370954641629]
2.5D-based segmentation models often treat each slice equally, failing to effectively learn and exploit inter-slice information.
A novel Momentum encoder-based inter-slice fusion transformer (MOSformer) is proposed to overcome this issue.
The MOSformer is evaluated on three benchmark datasets (Synapse, ACDC, and AMOS), establishing a new state-of-the-art with 85.63%, 92.19%, and 85.43% of DSC, respectively.
arXiv Detail & Related papers (2024-01-22T11:25:59Z) - MLF-DET: Multi-Level Fusion for Cross-Modal 3D Object Detection [54.52102265418295]
We propose a novel and effective Multi-Level Fusion network, named as MLF-DET, for high-performance cross-modal 3D object DETection.
For the feature-level fusion, we present the Multi-scale Voxel Image fusion (MVI) module, which densely aligns multi-scale voxel features with image features.
For the decision-level fusion, we propose the lightweight Feature-cued Confidence Rectification (FCR) module, which exploits image semantics to rectify the confidence of detection candidates.
arXiv Detail & Related papers (2023-07-18T11:26:02Z) - Enhancing Medical Image Segmentation with TransCeption: A Multi-Scale
Feature Fusion Approach [3.9548535445908928]
CNN-based methods have been the cornerstone of medical image segmentation due to their promising performance and robustness.
Transformer-based approaches are currently prevailing since they enlarge the reception field to model global contextual correlation.
We propose TransCeption for medical image segmentation, a pure transformer-based U-shape network featured by incorporating the inception-like module into the encoder.
arXiv Detail & Related papers (2023-01-25T22:09:07Z) - CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for
Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network.
We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z) - EAN: Event Adaptive Network for Enhanced Action Recognition [66.81780707955852]
We propose a unified action recognition framework to investigate the dynamic nature of video content.
First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events.
Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer.
arXiv Detail & Related papers (2021-07-22T15:57:18Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Motion-Attentive Transition for Zero-Shot Video Object Segmentation [99.44383412488703]
We present a Motion-Attentive Transition Network (MATNet) for zero-shot object segmentation.
An asymmetric attention block, called Motion-Attentive Transition (MAT), is designed within a two-stream encoder.
In this way, the encoder becomes deeply internative, allowing for closely hierarchical interactions between object motion and appearance.
arXiv Detail & Related papers (2020-03-09T16:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.