Related papers: An Effective UNet Using Feature Interaction and Fusion for Organ Segmentation in Medical Image

An Effective UNet Using Feature Interaction and Fusion for Organ Segmentation in Medical Image

URL: http://arxiv.org/abs/2409.05324v2
Date: Sat, 26 Jul 2025 07:15:24 GMT
Title: An Effective UNet Using Feature Interaction and Fusion for Organ Segmentation in Medical Image
Authors: Xiaolin Gou, Chuanlin Liao, Jizhe Zhou, Fengshuo Ye, Yi Lin,
Abstract summary: A novel U-shaped model is proposed to address the above issue, including three plug-and-play modules.<n>A channel spatial interaction module is introduced to improve the quality of skip connection features by modeling inter-stage interactions between the encoder and decoder.<n>A channel attention-based module integrating squeeze-and-excitation mechanisms with convolutional layers is employed in the decoder blocks to strengthen the representation of critical features while suppressing irrelevant ones.<n>A multi-level fusion module is designed to aggregate multi-scale decoder features, improving spatial detail and consistency in the final prediction.
Score: 5.510679875888542
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Nowadays, pre-trained encoders are widely used in medical image segmentation due to their strong capability in extracting rich and generalized feature representations. However, existing methods often fail to fully leverage these features, limiting segmentation performance. In this work, a novel U-shaped model is proposed to address the above issue, including three plug-and-play modules. A channel spatial interaction module is introduced to improve the quality of skip connection features by modeling inter-stage interactions between the encoder and decoder. A channel attention-based module integrating squeeze-and-excitation mechanisms with convolutional layers is employed in the decoder blocks to strengthen the representation of critical features while suppressing irrelevant ones. A multi-level fusion module is designed to aggregate multi-scale decoder features, improving spatial detail and consistency in the final prediction. Comprehensive experiments on the synapse multi-organ segmentation dataset and automated cardiac diagnosis challenge dataset demonstrate that the proposed model outperforms existing state-of-the-art methods, achieving the highest average Dice score of 86.05% and 92.58%, yielding improvements of 1.15% and 0.26%, respectively. In addition, the proposed model provides a balance between accuracy and computational complexity, with only 86.91 million parameters and 23.26 giga floating-point operations.

Related papers

Reducing Unimodal Bias in Multi-Modal Semantic Segmentation with Multi-Scale Functional Entropy Regularization [66.10528870853324]
Fusing and balancing multi-modal inputs from novel sensors for dense prediction tasks is critically important.<n>One major limitation is the tendency of multi-modal frameworks to over-rely on easily learnable modalities.<n>We propose a plug-and-play regularization term based on functional entropy, which introduces no additional parameters.
arXiv Detail & Related papers (2025-05-10T12:58:15Z)
Semi-supervised Semantic Segmentation with Multi-Constraint Consistency Learning [81.02648336552421]
We propose a Multi-Constraint Consistency Learning approach to facilitate the staged enhancement of the encoder and decoder. Self-adaptive feature masking and noise injection are designed in an instance-specific manner to perturb the features for robust learning of the decoder. Experimental results on Pascal VOC2012 and Cityscapes datasets demonstrate that our proposed MCCL achieves new state-of-the-art performance.
arXiv Detail & Related papers (2025-03-23T03:21:33Z)
Multimodal-Aware Fusion Network for Referring Remote Sensing Image Segmentation [7.992331117310217]
Referring remote sensing image segmentation (RRSIS) is a novel visual task in remote sensing images segmentation. We design a multimodal-aware fusion network (MAFN) to achieve fine-grained alignment and fusion between the two modalities.
arXiv Detail & Related papers (2025-03-14T08:31:21Z)
Optimized Unet with Attention Mechanism for Multi-Scale Semantic Segmentation [8.443350618722564]
This paper proposes an improved Unet model combined with an attention mechanism.<n>It introduces channel attention and spatial attention modules, enhances the model's ability to focus on important features.<n>The improved model performs well in terms of mIoU and pixel accuracy (PA), reaching 76.5% and 95.3% respectively.
arXiv Detail & Related papers (2025-02-06T06:51:23Z)
MSV-Mamba: A Multiscale Vision Mamba Network for Echocardiography Segmentation [8.090155401012169]
Mamba, an emerging model, is one of the most cutting-edge approaches that is widely applied to diverse vision and language tasks.<n>This paper introduces a U-shaped deep learning model incorporating a large-window multiscale mamba module and a hierarchical feature fusion approach for echocardiographic segmentation.
arXiv Detail & Related papers (2025-01-13T08:22:10Z)
PINN-EMFNet: PINN-based and Enhanced Multi-Scale Feature Fusion Network for Breast Ultrasound Images Segmentation [5.246262946799736]
This study proposes a PINN-based and Enhanced Multi-Scale Feature Fusion Network.<n>The network efficiently integrates and globally models multi-scale features through several structural innovations.<n>In the decoder section, a Multi-Scale Feature Refinement Decoder is employed, which, combined with a Multi-Scale Supervision Mechanism and a correction module, significantly improves segmentation accuracy and adaptability.
arXiv Detail & Related papers (2024-12-22T09:16:00Z)
Few-Shot Medical Image Segmentation with Large Kernel Attention [5.630842216128902]
We propose a few-shot medical segmentation model that acquire comprehensive feature representation capabilities. Our model comprises four key modules: a dual-path feature extractor, an attention module, an adaptive prototype prediction module, and a multi-scale prediction fusion module. The results demonstrate that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-07-27T02:28:30Z)
Local-to-Global Cross-Modal Attention-Aware Fusion for HSI-X Semantic Segmentation [19.461033552684576]
We propose a Local-to-Global Cross-modal Attention-aware Fusion (LoGoCAF) framework for HSI-X classification. LoGoCAF adopts a pixel-to-pixel two-branch semantic segmentation architecture to learn information from HSI and X modalities.
arXiv Detail & Related papers (2024-06-25T16:12:20Z)
DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection [52.74152717667157]
We propose a lightweight module called Dual Attention Module (DAM) for capturing cross-dimension interaction relationships in-temporal skeletal data. It employs the frame attention mechanism to identify the most significant frames and the skeleton attention mechanism to capture broader relationships across fixed partitions with minimal parameters and flops.
arXiv Detail & Related papers (2024-06-05T06:18:03Z)
Modality Prompts for Arbitrary Modality Salient Object Detection [57.610000247519196]
This paper delves into the task of arbitrary modality salient object detection (AM SOD) It aims to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images. A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD.
arXiv Detail & Related papers (2024-05-06T11:02:02Z)
CFPFormer: Feature-pyramid like Transformer Decoder for Segmentation and Detection [1.837431956557716]
Feature pyramids have been widely adopted in convolutional neural networks (CNNs) and transformers for tasks like medical image segmentation and object detection. We propose a novel decoder block that integrates feature pyramids and transformers. Our model achieves superior performance in detecting small objects compared to existing methods.
arXiv Detail & Related papers (2024-04-23T18:46:07Z)
MOSformer: Momentum encoder-based inter-slice fusion transformer for medical image segmentation [15.94370954641629]
2.5D-based segmentation models often treat each slice equally, failing to effectively learn and exploit inter-slice information. A novel Momentum encoder-based inter-slice fusion transformer (MOSformer) is proposed to overcome this issue. The MOSformer is evaluated on three benchmark datasets (Synapse, ACDC, and AMOS), establishing a new state-of-the-art with 85.63%, 92.19%, and 85.43% of DSC, respectively.
arXiv Detail & Related papers (2024-01-22T11:25:59Z)
Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis. We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z)
Unite-Divide-Unite: Joint Boosting Trunk and Structure for High-accuracy Dichotomous Image Segmentation [48.995367430746086]
High-accuracy Dichotomous Image rendering (DIS) aims to pinpoint category-agnostic foreground objects from natural scenes. We introduce a novel Unite-Divide-Unite Network (UDUN) that restructures and bipartitely arranges complementary features to boost the effectiveness of trunk and structure identification. Using 1024*1024 input, our model enables real-time inference at 65.3 fps with ResNet-18.
arXiv Detail & Related papers (2023-07-26T09:04:35Z)
MLF-DET: Multi-Level Fusion for Cross-Modal 3D Object Detection [54.52102265418295]
We propose a novel and effective Multi-Level Fusion network, named as MLF-DET, for high-performance cross-modal 3D object DETection. For the feature-level fusion, we present the Multi-scale Voxel Image fusion (MVI) module, which densely aligns multi-scale voxel features with image features. For the decision-level fusion, we propose the lightweight Feature-cued Confidence Rectification (FCR) module, which exploits image semantics to rectify the confidence of detection candidates.
arXiv Detail & Related papers (2023-07-18T11:26:02Z)
Enhancing Medical Image Segmentation with TransCeption: A Multi-Scale Feature Fusion Approach [3.9548535445908928]
CNN-based methods have been the cornerstone of medical image segmentation due to their promising performance and robustness. Transformer-based approaches are currently prevailing since they enlarge the reception field to model global contextual correlation. We propose TransCeption for medical image segmentation, a pure transformer-based U-shape network featured by incorporating the inception-like module into the encoder.
arXiv Detail & Related papers (2023-01-25T22:09:07Z)
UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed. The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features. Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z)
CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network. We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z)
EAN: Event Adaptive Network for Enhanced Action Recognition [66.81780707955852]
We propose a unified action recognition framework to investigate the dynamic nature of video content. First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events. Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer.
arXiv Detail & Related papers (2021-07-22T15:57:18Z)
EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics. In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF) We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z)
Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels. To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit. Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z)
Motion-Attentive Transition for Zero-Shot Video Object Segmentation [99.44383412488703]
We present a Motion-Attentive Transition Network (MATNet) for zero-shot object segmentation. An asymmetric attention block, called Motion-Attentive Transition (MAT), is designed within a two-stream encoder. In this way, the encoder becomes deeply internative, allowing for closely hierarchical interactions between object motion and appearance.
arXiv Detail & Related papers (2020-03-09T16:58:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.