S2M-Net: Spectral-Spatial Mixing for Medical Image Segmentation with Morphology-Aware Adaptive Loss
- URL: http://arxiv.org/abs/2601.01285v1
- Date: Sat, 03 Jan 2026 21:03:54 GMT
- Title: S2M-Net: Spectral-Spatial Mixing for Medical Image Segmentation with Morphology-Aware Adaptive Loss
- Authors: Md. Sanaullah Chowdhury Lameya Sabrin,
- Abstract summary: Medical image segmentation requires balancing local precision for boundary-critical clinical applications, global context for anatomical coherence, and computational efficiency for deployment on limited data and hardware a trilemma that existing architectures fail to resolve.<n>We propose S2M-Net, a 4.7M- parameter architecture that achieves global context through two synergistic innovations: (i) Spectral-Selective Token Mixer (SSTM), which exploits the spectral concentration of medical images via truncated 2D FFT with learnable frequency filtering and content-gated spatial projection, avoiding quadratic attention cost while maintaining global receptive fields; and (ii) Morphology-Aware Adaptive Loss (MAS
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Medical image segmentation requires balancing local precision for boundary-critical clinical applications, global context for anatomical coherence, and computational efficiency for deployment on limited data and hardware a trilemma that existing architectures fail to resolve. Although convolutional networks provide local precision at $\mathcal{O}(n)$ cost but limited receptive fields, vision transformers achieve global context through $\mathcal{O}(n^2)$ self-attention at prohibitive computational expense, causing overfitting on small clinical datasets. We propose S2M-Net, a 4.7M-parameter architecture that achieves $\mathcal{O}(HW \log HW)$ global context through two synergistic innovations: (i) Spectral-Selective Token Mixer (SSTM), which exploits the spectral concentration of medical images via truncated 2D FFT with learnable frequency filtering and content-gated spatial projection, avoiding quadratic attention cost while maintaining global receptive fields; and (ii) Morphology-Aware Adaptive Segmentation Loss (MASL), which automatically analyzes structure characteristics (compactness, tubularity, irregularity, scale) to modulate five complementary loss components through constrained learnable weights, eliminating manual per-dataset tuning. Comprehensive evaluation in 16 medical imaging datasets that span 8 modalities demonstrates state-of-the-art performance: 96.12\% Dice on polyp segmentation, 83.77\% on surgical instruments (+17.85\% over the prior art) and 80.90\% on brain tumors, with consistent 3-18\% improvements over specialized baselines while using 3.5--6$\times$ fewer parameters than transformer-based methods.
Related papers
- Revisiting Global Token Mixing in Task-Dependent MRI Restoration: Insights from Minimal Gated CNN Baselines [43.505945728449774]
Global token mixing has become a popular model design choice for MRI restoration.<n>We ask whether global token mixing is actually beneficial in each individual task across three representative settings.<n>For accelerated MRI reconstruction, the minimal unrolled gated-CNN baseline is already highly competitive.<n>For super-resolution, where low-frequency k-space data are largely preserved by the controlled low-pass degradation, local gated models remain competitive.<n>For denoising with pronounced spatially heteroscedastic noise, token-mixing models achieve the strongest overall performance.
arXiv Detail & Related papers (2026-03-02T04:57:52Z) - When CNNs Outperform Transformers and Mambas: Revisiting Deep Architectures for Dental Caries Segmentation [9.108764893521526]
We present the first comprehensive benchmarking of convolutional neural networks, vision transformers and state-space mamba architectures for automated dental caries segmentation on panoramic radiographs through a DC1000 dataset.<n>Results reveal that, contrary to the growing trend toward complex attention based architectures, the CNN-based DoubleU-Net achieved the highest dice coefficient of 0.7345, mIoU of 0.5978, and precision of 0.8145, outperforming all transformer and Mamba variants.
arXiv Detail & Related papers (2025-11-18T19:16:21Z) - ReCoSeg++:Extended Residual-Guided Cross-Modal Diffusion for Brain Tumor Segmentation [0.9374652839580183]
We propose a semi-supervised, two-stage framework that extends the ReCoSeg approach to the larger and more heterogeneous BraTS 2021 dataset.<n>In the first stage, a residual-guided denoising diffusion probabilistic model (DDPM) performs cross-modal synthesis by reconstructing the T1ce modality from FLAIR, T1, and T2 scans.<n>In the second stage, a lightweight U-Net takes as input the concatenation of residual maps, computed as the difference between real T1ce and synthesized T1ce, with T1, T2, and FLAIR modalities to improve whole tumor segmentation
arXiv Detail & Related papers (2025-08-01T20:24:31Z) - trAIce3D: A Prompt-Driven Transformer Based U-Net for Semantic Segmentation of Microglial Cells from Large-Scale 3D Microscopy Images [39.58317527488534]
We introduce trAIce3D, a deep-learning architecture designed for precise microglia segmentation.<n>It employs a two-stage approach: first, a 3D U-Net with vision transformers in the encoder detects somas using a sliding-window technique to cover the entire image.<n>It then refines each soma and its branches by using soma coordinates as a prompt and a 3D window around the target cell as input.<n>trained and evaluated on a dataset of 41,230 microglial cells, trAIce3D significantly improves segmentation accuracy and scalable generalization.
arXiv Detail & Related papers (2025-07-30T12:54:53Z) - Graph-based Multi-Modal Interaction Lightweight Network for Brain Tumor Segmentation (GMLN-BTS) in Edge Iterative MRI Lesion Localization System (EdgeIMLocSys) [6.451534509235736]
We propose the Edge Iterative MRI Lesion Localization System (EdgeIMLocSys), which integrates Continuous Learning from Human Feedback.<n>Central to this system is the Graph-based Multi-Modal Interaction Lightweight Network for Brain Tumor (GMLN-BTS)<n>Our proposed GMLN-BTS model achieves a Dice score of 85.1% on the BraTS 2017 dataset with only 4.58 million parameters, representing a 98% reduction compared to mainstream 3D Transformer models.
arXiv Detail & Related papers (2025-07-14T07:29:49Z) - Automated MRI Tumor Segmentation using hybrid U-Net with Transformer and Efficient Attention [0.7456526005219317]
Cancer is an abnormal growth with potential to invade locally and metastasize to distant organs.<n>Recent AI-based segmentation models are generally trained on large public datasets.<n>This study develops and integrates AI tumor segmentation models directly into hospital software for efficient and accurate oncology treatment planning and execution.
arXiv Detail & Related papers (2025-06-18T15:36:37Z) - LATUP-Net: A Lightweight 3D Attention U-Net with Parallel Convolutions for Brain Tumor Segmentation [7.1789008189318455]
LATUP-Net is a lightweight 3D ATtention U-Net with Parallel convolutions.
It is specifically designed to reduce computational requirements significantly while maintaining high segmentation performance.
It achieves promising segmentation performance: the average Dice scores for the whole tumor, tumor core, and enhancing tumor on the BraTS 2020 dataset are 88.41%, 83.82%, and 73.67%, and on the BraTS 2021 dataset, they are 90.29%, 89.54%, and 83.92%, respectively.
arXiv Detail & Related papers (2024-04-09T00:05:45Z) - Leveraging Frequency Domain Learning in 3D Vessel Segmentation [50.54833091336862]
In this study, we leverage Fourier domain learning as a substitute for multi-scale convolutional kernels in 3D hierarchical segmentation models.
We show that our novel network achieves remarkable dice performance (84.37% on ASACA500 and 80.32% on ImageCAS) in tubular vessel segmentation tasks.
arXiv Detail & Related papers (2024-01-11T19:07:58Z) - 3DSAM-adapter: Holistic adaptation of SAM from 2D to 3D for promptable tumor segmentation [52.699139151447945]
We propose a novel adaptation method for transferring the segment anything model (SAM) from 2D to 3D for promptable medical image segmentation.
Our model can outperform domain state-of-the-art medical image segmentation models on 3 out of 4 tasks, specifically by 8.25%, 29.87%, and 10.11% for kidney tumor, pancreas tumor, colon cancer segmentation, and achieve similar performance for liver tumor segmentation.
arXiv Detail & Related papers (2023-06-23T12:09:52Z) - CNN-based fully automatic wrist cartilage volume quantification in MR
Image [55.41644538483948]
The U-net convolutional neural network with additional attention layers provides the best wrist cartilage segmentation performance.
The error of cartilage volume measurement should be assessed independently using a non-MRI method.
arXiv Detail & Related papers (2022-06-22T14:19:06Z) - Negligible effect of brain MRI data preprocessing for tumor segmentation [36.89606202543839]
We conduct experiments on three publicly available datasets and evaluate the effect of different preprocessing steps in deep neural networks.
Our results demonstrate that most popular standardization steps add no value to the network performance.
We suggest that image intensity normalization approaches do not contribute to model accuracy because of the reduction of signal variance with image standardization.
arXiv Detail & Related papers (2022-04-11T17:29:36Z) - Automatic size and pose homogenization with spatial transformer network
to improve and accelerate pediatric segmentation [51.916106055115755]
We propose a new CNN architecture that is pose and scale invariant thanks to the use of Spatial Transformer Network (STN)
Our architecture is composed of three sequential modules that are estimated together during training.
We test the proposed method in kidney and renal tumor segmentation on abdominal pediatric CT scanners.
arXiv Detail & Related papers (2021-07-06T14:50:03Z) - Fully Automated 3D Segmentation of MR-Imaged Calf Muscle Compartments:
Neighborhood Relationship Enhanced Fully Convolutional Network [6.597152960878372]
FilterNet is a novel fully convolutional network (FCN) that embeds edge-aware constraints for individual calf muscle compartment segmentations.
FCN was evaluated on 40 T1-weighted MR images of 10 healthy and 30 diseased subjects by 4-fold cross-validation.
arXiv Detail & Related papers (2020-06-21T22:53:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.