S$^3$F-Net: A Multi-Modal Approach to Medical Image Classification via Spatial-Spectral Summarizer Fusion Network
- URL: http://arxiv.org/abs/2509.23442v1
- Date: Sat, 27 Sep 2025 18:18:39 GMT
- Title: S$^3$F-Net: A Multi-Modal Approach to Medical Image Classification via Spatial-Spectral Summarizer Fusion Network
- Authors: Md. Saiful Bari Siddiqui, Mohammed Imamul Hassan Bhuiyan,
- Abstract summary: We propose a dual-branch framework that learns from both spatial and spectral representations simultaneously.<n>We evaluate S$3$F-Net across four medical imaging datasets spanning different modalities.<n>Our framework consistently and significantly outperforms its strong spatial-only baseline in all cases.
- Score: 0.20625936401496228
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional Neural Networks have become a cornerstone of medical image analysis due to their proficiency in learning hierarchical spatial features. However, this focus on a single domain is inefficient at capturing global, holistic patterns and fails to explicitly model an image's frequency-domain characteristics. To address these challenges, we propose the Spatial-Spectral Summarizer Fusion Network (S$^3$F-Net), a dual-branch framework that learns from both spatial and spectral representations simultaneously. The S$^3$F-Net performs a fusion of a deep spatial CNN with our proposed shallow spectral encoder, SpectraNet. SpectraNet features the proposed SpectralFilter layer, which leverages the Convolution Theorem by applying a bank of learnable filters directly to an image's full Fourier spectrum via a computation-efficient element-wise multiplication. This allows the SpectralFilter layer to attain a global receptive field instantaneously, with its output being distilled by a lightweight summarizer network. We evaluate S$^3$F-Net across four medical imaging datasets spanning different modalities to validate its efficacy and generalizability. Our framework consistently and significantly outperforms its strong spatial-only baseline in all cases, with accuracy improvements of up to 5.13%. With a powerful Bilinear Fusion, S$^3$F-Net achieves a SOTA competitive accuracy of 98.76% on the BRISC2025 dataset. Concatenation Fusion performs better on the texture-dominant Chest X-Ray Pneumonia dataset, achieving 93.11% accuracy, surpassing many top-performing, much deeper models. Our explainability analysis also reveals that the S$^3$F-Net learns to dynamically adjust its reliance on each branch based on the input pathology. These results verify that our dual-domain approach is a powerful and generalizable paradigm for medical image analysis.
Related papers
- Phi-SegNet: Phase-Integrated Supervision for Medical Image Segmentation [1.76179873429447]
We propose Phi-SegNet, a CNN-based architecture that incorporates phase-aware information at both architectural and optimization levels.<n>Phi-SegNet consistently achieved state-of-the-art performance on five public datasets spanning X-ray, US, histopathology, MRI, and colonoscopy.
arXiv Detail & Related papers (2026-01-22T16:00:41Z) - FractMorph: A Fractional Fourier-Based Multi-Domain Transformer for Deformable Image Registration [0.6683923149620578]
We present FractMorph, a novel 3D dual-parallel transformer-based architecture that enhances cross-image feature matching.<n>A lightweight U-Net style network then predicts a dense deformation field from the transformer-enriched features.<n>Results show FractMorph achieves state-of-the-art performance with an overall Dice Similarity Coefficient (DSC) of $86.45%$, an average per-structure of $75.15%$, and a 95th-percentile Hausdorff distance (HD95) of $1.54mathrmmm$ on our data split.
arXiv Detail & Related papers (2025-08-17T17:42:10Z) - FreqCross: A Multi-Modal Frequency-Spatial Fusion Network for Robust Detection of Stable Diffusion 3.5 Generated Images [4.524282351757178]
FreqCross is a novel multi-modal fusion network that combines spatial RGB features, frequency domain artifacts, and radial energy distribution patterns.<n>Experiments on a dataset of 10,000 paired real (MS-COCO) and synthetic (Stable Diffusion 3.5) images demonstrate that FreqCross achieves 97.8% accuracy.
arXiv Detail & Related papers (2025-07-01T22:12:35Z) - How Learnable Grids Recover Fine Detail in Low Dimensions: A Neural Tangent Kernel Analysis of Multigrid Parametric Encodings [106.3726679697804]
We compare the two most common techniques for mitigating this spectral bias: Fourier feature encodings (FFE) and multigrid parametric encodings (MPE)<n>MPEs are seen as the standard for low dimensional mappings, but MPEs often outperform them and learn representations with higher resolution and finer detail.<n>We prove that MPEs improve a network's performance through the structure of their grid and not their learnable embedding.
arXiv Detail & Related papers (2025-04-18T02:18:08Z) - Robust Hyperspectral Image Panshapring via Sparse Spatial-Spectral Representation [9.3350274016294]
S$3$RNet is a novel framework for hyperspectral image pansharpening.<n>It combines low-resolution hyperspectral images (LRHSI) with high-resolution multispectral images (HRMSI) through sparse spatial-spectral representation.<n>S$3$RNet achieves state-of-the-art performance across multiple evaluation metrics.
arXiv Detail & Related papers (2025-01-14T09:09:14Z) - CMTNet: Convolutional Meets Transformer Network for Hyperspectral Images Classification [3.821081081400729]
Current convolutional neural networks (CNNs) focus on local features in hyperspectral data.<n> Transformer framework excels at extracting global features from hyperspectral imagery.<n>This research introduces the Convolutional Meet Transformer Network (CMTNet)
arXiv Detail & Related papers (2024-06-20T07:56:51Z) - SpectralFormer: Rethinking Hyperspectral Image Classification with
Transformers [91.09957836250209]
Hyperspectral (HS) images are characterized by approximately contiguous spectral information.
CNNs have been proven to be a powerful feature extractor in HS image classification.
We propose a novel backbone network called ulSpectralFormer for HS image classification.
arXiv Detail & Related papers (2021-07-07T02:59:21Z) - Global Filter Networks for Image Classification [90.81352483076323]
We present a conceptually simple yet computationally efficient architecture that learns long-term spatial dependencies in the frequency domain with log-linear complexity.
Our results demonstrate that GFNet can be a very competitive alternative to transformer-style models and CNNs in efficiency, generalization ability and robustness.
arXiv Detail & Related papers (2021-07-01T17:58:16Z) - Spectral Analysis Network for Deep Representation Learning and Image
Clustering [53.415803942270685]
This paper proposes a new network structure for unsupervised deep representation learning based on spectral analysis.
It can identify the local similarities among images in patch level and thus more robust against occlusion.
It can learn more clustering-friendly representations and is capable to reveal the deep correlations among data samples.
arXiv Detail & Related papers (2020-09-11T05:07:15Z) - Hyperspectral Image Super-resolution via Deep Progressive Zero-centric
Residual Learning [62.52242684874278]
Cross-modality distribution of spatial and spectral information makes the problem challenging.
We propose a novel textitlightweight deep neural network-based framework, namely PZRes-Net.
Our framework learns a high resolution and textitzero-centric residual image, which contains high-frequency spatial details of the scene.
arXiv Detail & Related papers (2020-06-18T06:32:11Z) - Spatial-Spectral Residual Network for Hyperspectral Image
Super-Resolution [82.1739023587565]
We propose a novel spectral-spatial residual network for hyperspectral image super-resolution (SSRNet)
Our method can effectively explore spatial-spectral information by using 3D convolution instead of 2D convolution, which enables the network to better extract potential information.
In each unit, we employ spatial and temporal separable 3D convolution to extract spatial and spectral information, which not only reduces unaffordable memory usage and high computational cost, but also makes the network easier to train.
arXiv Detail & Related papers (2020-01-14T03:34:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.