MSA2-Net: Utilizing Self-Adaptive Convolution Module to Extract Multi-Scale Information in Medical Image Segmentation
- URL: http://arxiv.org/abs/2509.01498v2
- Date: Wed, 03 Sep 2025 02:39:39 GMT
- Title: MSA2-Net: Utilizing Self-Adaptive Convolution Module to Extract Multi-Scale Information in Medical Image Segmentation
- Authors: Chao Deng, Xiaosen Li, Xiao Qin,
- Abstract summary: Self-Adaptive Convolution Module dynamically adjusts the size of the convolution kernels depending on unique fingerprints of different datasets.<n> Module is strategically integrated into two key components of the MSA2-Net: the Multi-Scale Convolution Bridge and the Multi-Scale Amalgamation Decoder.
- Score: 31.007198872745892
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The nnUNet segmentation framework adeptly adjusts most hyperparameters in training scripts automatically, but it overlooks the tuning of internal hyperparameters within the segmentation network itself, which constrains the model's ability to generalize. Addressing this limitation, this study presents a novel Self-Adaptive Convolution Module that dynamically adjusts the size of the convolution kernels depending on the unique fingerprints of different datasets. This adjustment enables the MSA2-Net, when equipped with this module, to proficiently capture both global and local features within the feature maps. Self-Adaptive Convolution Module is strategically integrated into two key components of the MSA2-Net: the Multi-Scale Convolution Bridge and the Multi-Scale Amalgamation Decoder. In the MSConvBridge, the module enhances the ability to refine outputs from various stages of the CSWin Transformer during the skip connections, effectively eliminating redundant data that could potentially impair the decoder's performance. Simultaneously, the MSADecoder, utilizing the module, excels in capturing detailed information of organs varying in size during the decoding phase. This capability ensures that the decoder's output closely reproduces the intricate details within the feature maps, thus yielding highly accurate segmentation images. MSA2-Net, bolstered by this advanced architecture, has demonstrated exceptional performance, achieving Dice coefficient scores of 86.49\%, 92.56\%, 93.37\%, and 92.98\% on the Synapse, ACDC, Kvasir, and Skin Lesion Segmentation (ISIC2017) datasets, respectively. This underscores MSA2-Net's robustness and precision in medical image segmentation tasks across various datasets.
Related papers
- Re-Densification Meets Cross-Scale Propagation: Real-Time Compression of LiDAR Point Clouds [84.36825469211375]
LiDAR point clouds are fundamental to various applications, yet high-precision scans incur substantial storage and transmission overhead.<n>Existing methods typically convert unordered points into hierarchical octree or voxel structures for dense-to-sparse predictive coding.<n>Our framework comprises two lightweight modules. First, the Geometry Re-Densification Module re-densifies encoded sparse geometry, extracts features at denser scale, and then re-sparsifies the features for predictive coding.
arXiv Detail & Related papers (2025-08-28T06:36:10Z) - MSA$^2$Net: Multi-scale Adaptive Attention-guided Network for Medical Image Segmentation [8.404273502720136]
We introduce MSA$2$Net, a new deep segmentation framework featuring an expedient design of skip-connections.
We propose a Multi-Scale Adaptive Spatial Attention Gate (MASAG) to ensure that spatially relevant features are selectively highlighted.
Our MSA$2$Net outperforms state-of-the-art (SOTA) works or matches their performance.
arXiv Detail & Related papers (2024-07-31T14:41:10Z) - SACNet: A Spatially Adaptive Convolution Network for 2D Multi-organ Medical Segmentation [7.897088081928714]
Multi-organ segmentation in medical image analysis is crucial for diagnosis and treatment planning.<n>In this paper, we utilize the knowledge of Deformable Convolution V3 to optimize our Spatially Adaptive Convolution Network (SACNet)<n>Experiments on 3D slice datasets from ACDC and Synapse demonstrate that SACNet delivers superior segmentation performance compared to several existing methods.
arXiv Detail & Related papers (2024-07-14T10:58:09Z) - P-MSDiff: Parallel Multi-Scale Diffusion for Remote Sensing Image Segmentation [8.46409964236009]
Diffusion models and multi-scale features are essential components in semantic segmentation tasks.
We propose a new model for semantic segmentation known as the diffusion model with parallel multi-scale branches.
Our model demonstrates superior performance based on the J1 metric on both the UAVid and Vaihingen Building datasets.
arXiv Detail & Related papers (2024-05-30T19:40:08Z) - Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis.
We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z) - Unite-Divide-Unite: Joint Boosting Trunk and Structure for High-accuracy
Dichotomous Image Segmentation [48.995367430746086]
High-accuracy Dichotomous Image rendering (DIS) aims to pinpoint category-agnostic foreground objects from natural scenes.
We introduce a novel Unite-Divide-Unite Network (UDUN) that restructures and bipartitely arranges complementary features to boost the effectiveness of trunk and structure identification.
Using 1024*1024 input, our model enables real-time inference at 65.3 fps with ResNet-18.
arXiv Detail & Related papers (2023-07-26T09:04:35Z) - LENet: Lightweight And Efficient LiDAR Semantic Segmentation Using
Multi-Scale Convolution Attention [0.0]
We propose a projection-based semantic segmentation network called LENet with an encoder-decoder structure for LiDAR-based semantic segmentation.
The encoder is composed of a novel multi-scale convolutional attention (MSCA) module with varying receptive field sizes to capture features.
We show that our proposed method is lighter, more efficient, and robust compared to state-of-the-art semantic segmentation methods.
arXiv Detail & Related papers (2023-01-11T02:51:38Z) - Lightweight Salient Object Detection in Optical Remote-Sensing Images
via Semantic Matching and Edge Alignment [61.45639694373033]
We propose a novel lightweight network for optical remote sensing images (ORSI-SOD) based on semantic matching and edge alignment, termed SeaNet.
Specifically, SeaNet includes a lightweight MobileNet-V2 for feature extraction, a dynamic semantic matching module (DSMM) for high-level features, and a portable decoder for inference.
arXiv Detail & Related papers (2023-01-07T04:33:51Z) - Neural Attentive Circuits [93.95502541529115]
We introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs)
NACs learn the parameterization and a sparse connectivity of neural modules without using domain knowledge.
NACs achieve an 8x speedup at inference time while losing less than 3% performance.
arXiv Detail & Related papers (2022-10-14T18:00:07Z) - Specificity-preserving RGB-D Saliency Detection [103.3722116992476]
We propose a specificity-preserving network (SP-Net) for RGB-D saliency detection.
Two modality-specific networks and a shared learning network are adopted to generate individual and shared saliency maps.
Experiments on six benchmark datasets demonstrate that our SP-Net outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2021-08-18T14:14:22Z) - TAM: Temporal Adaptive Module for Video Recognition [60.83208364110288]
temporal adaptive module (bf TAM) generates video-specific temporal kernels based on its own feature map.
Experiments on Kinetics-400 and Something-Something datasets demonstrate that our TAM outperforms other temporal modeling methods consistently.
arXiv Detail & Related papers (2020-05-14T08:22:45Z) - Encoder-Decoder Based Convolutional Neural Networks with
Multi-Scale-Aware Modules for Crowd Counting [6.893512627479196]
We propose two modified neural networks for accurate and efficient crowd counting.
The first model is named M-SFANet, which is attached with atrous spatial pyramid pooling (ASPP) and context-aware module (CAN)
The second model is called M-SegNet, which is produced by replacing the bilinear upsampling in SFANet with max unpooling that is used in SegNet.
arXiv Detail & Related papers (2020-03-12T03:00:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.