Systematic Integration of Attention Modules into CNNs for Accurate and Generalizable Medical Image Diagnosis
- URL: http://arxiv.org/abs/2509.05343v1
- Date: Tue, 02 Sep 2025 05:45:27 GMT
- Title: Systematic Integration of Attention Modules into CNNs for Accurate and Generalizable Medical Image Diagnosis
- Authors: Zahid Ullah, Minki Hong, Tahir Mahmood, Jihie Kim,
- Abstract summary: We integrate attention mechanisms into five widely adopted CNN architectures.<n>Each baseline model is augmented with either a Squeeze and Excitation block or a hybrid Convolutional Block Attention Module.<n>We show that attention augmented CNNs consistently outperform baseline architectures across all metrics.
- Score: 7.082043294630761
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning has become a powerful tool for medical image analysis; however, conventional Convolutional Neural Networks (CNNs) often fail to capture the fine-grained and complex features critical for accurate diagnosis. To address this limitation, we systematically integrate attention mechanisms into five widely adopted CNN architectures, namely, VGG16, ResNet18, InceptionV3, DenseNet121, and EfficientNetB5, to enhance their ability to focus on salient regions and improve discriminative performance. Specifically, each baseline model is augmented with either a Squeeze and Excitation block or a hybrid Convolutional Block Attention Module, allowing adaptive recalibration of channel and spatial feature representations. The proposed models are evaluated on two distinct medical imaging datasets, a brain tumor MRI dataset comprising multiple tumor subtypes, and a Products of Conception histopathological dataset containing four tissue categories. Experimental results demonstrate that attention augmented CNNs consistently outperform baseline architectures across all metrics. In particular, EfficientNetB5 with hybrid attention achieves the highest overall performance, delivering substantial gains on both datasets. Beyond improved classification accuracy, attention mechanisms enhance feature localization, leading to better generalization across heterogeneous imaging modalities. This work contributes a systematic comparative framework for embedding attention modules in diverse CNN architectures and rigorously assesses their impact across multiple medical imaging tasks. The findings provide practical insights for the development of robust, interpretable, and clinically applicable deep learning based decision support systems.
Related papers
- MambaCAFU: Hybrid Multi-Scale and Multi-Attention Model with Mamba-Based Fusion for Medical Image Segmentation [11.967890140626716]
We propose a hybrid segmentation architecture featuring a three-branch encoder that integrates CNNs, Transformers, and a Mamba-based Attention Fusion mechanism.<n>A multi-scale attention-based CNN decoder reconstructs fine-grained segmentation maps while preserving contextual consistency.<n>Our approach outperforms state-of-the-art methods in accuracy and generalization, while maintaining comparable computational complexity.
arXiv Detail & Related papers (2025-10-04T11:25:10Z) - CENet: Context Enhancement Network for Medical Image Segmentation [3.4690322157094573]
We propose the Context Enhancement Network (CENet), a novel segmentation framework featuring two key innovations.<n>First, the Dual Selective Enhancement Block (DSEB) integrated into skip connections enhances boundary details and improves the detection of smaller organs in a context-aware manner.<n>Second, the Context Feature Attention Module (CFAM) in the decoder employs a multi-scale design to maintain spatial integrity, reduce feature redundancy, and mitigate overly enhanced representations.
arXiv Detail & Related papers (2025-05-23T23:22:18Z) - ConnectomeDiffuser: Generative AI Enables Brain Network Construction from Diffusion Tensor Imaging [36.89452147894995]
Brain network analysis plays a crucial role in diagnosing and monitoring neurodegenerative disorders such as Alzheimer's disease (AD)<n>Existing approaches for constructing structural brain networks from diffusion tensor imaging (DTI) often rely on specialized toolkits that suffer from inherent limitations.<n>This work proposes a novel diffusion-based framework for automated end-to-end brain network construction from DTI.
arXiv Detail & Related papers (2025-05-23T15:03:58Z) - FundusGAN: A Hierarchical Feature-Aware Generative Framework for High-Fidelity Fundus Image Generation [35.46876389599076]
FundusGAN is a novel hierarchical feature-aware generative framework specifically designed for high-fidelity fundus image synthesis.<n>We show that FundusGAN consistently outperforms state-of-the-art methods across multiple metrics.
arXiv Detail & Related papers (2025-03-22T18:08:07Z) - Hybrid Interpretable Deep Learning Framework for Skin Cancer Diagnosis: Integrating Radial Basis Function Networks with Explainable AI [1.1049608786515839]
Skin cancer is one of the most prevalent and potentially life-threatening diseases worldwide.<n>We propose a novel hybrid deep learning framework that integrates convolutional neural networks (CNNs) with Radial Basis Function (RBF) Networks to achieve high classification accuracy and enhanced interpretability.
arXiv Detail & Related papers (2025-01-24T19:19:02Z) - HYATT-Net is Grand: A Hybrid Attention Network for Performant Anatomical Landmark Detection [17.290208035331734]
Anatomical landmark detection (ALD) from a medical image is crucial for a wide array of clinical applications.<n>We propose a novel hybrid architecture that integrates CNNs and Transformers.<n> Experiments on five diverse datasets demonstrate state-of-the-art performance, surpassing existing methods in accuracy, robustness, and efficiency.
arXiv Detail & Related papers (2024-12-09T13:58:00Z) - CAVE-Net: Classifying Abnormalities in Video Capsule Endoscopy [0.1937002985471497]
We propose an ensemble-based approach to improve diagnostic accuracy in analyzing complex image datasets.<n>We leverage the unique feature extraction capabilities of each model to enhance the overall accuracy.<n>By using these methods, the proposed framework, CAVE-Net, provides robust feature discrimination and improved classification results.
arXiv Detail & Related papers (2024-10-26T17:25:08Z) - L-SFAN: Lightweight Spatially-focused Attention Network for Pain Behavior Detection [44.016805074560295]
Chronic Low Back Pain (CLBP) afflicts millions globally, significantly impacting individuals' well-being and imposing economic burdens on healthcare systems.
While artificial intelligence (AI) and deep learning offer promising avenues for analyzing pain-related behaviors to improve rehabilitation strategies, current models, including convolutional neural networks (CNNs), have limitations.
We introduce hbox EmoL-SFAN, a lightweight CNN architecture incorporating 2D filters designed to capture the spatial-temporal interplay of data from motion capture and surface electromyography sensors.
arXiv Detail & Related papers (2024-06-07T12:01:37Z) - Optimizing Universal Lesion Segmentation: State Space Model-Guided Hierarchical Networks with Feature Importance Adjustment [0.0]
We introduce Mamba-Ahnet, a novel integration of State Space Model (SSM) and Advanced Hierarchical Network (AHNet) within the MAMBA framework.
Mamba-Ahnet combines SSM's feature extraction and comprehension with AHNet's attention mechanisms and image reconstruction, aiming to enhance segmentation accuracy and robustness.
arXiv Detail & Related papers (2024-04-26T08:15:43Z) - TransAttUnet: Multi-level Attention-guided U-Net with Transformer for
Medical Image Segmentation [33.45471457058221]
This paper proposes a novel Transformer based medical image semantic segmentation framework called TransAttUnet.
In particular, we establish additional multi-scale skip connections between decoder blocks to aggregate the different semantic-scale upsampling features.
Our method consistently outperforms the state-of-the-art baselines.
arXiv Detail & Related papers (2021-07-12T09:17:06Z) - Ensemble manifold based regularized multi-modal graph convolutional
network for cognitive ability prediction [33.03449099154264]
Multi-modal functional magnetic resonance imaging (fMRI) can be used to make predictions about individual behavioral and cognitive traits based on brain connectivity networks.
We propose an interpretable multi-modal graph convolutional network (MGCN) model, incorporating the fMRI time series and the functional connectivity (FC) between each pair of brain regions.
We validate our MGCN model on the Philadelphia Neurodevelopmental Cohort to predict individual wide range achievement test (WRAT) score.
arXiv Detail & Related papers (2021-01-20T20:53:07Z) - Few-shot Medical Image Segmentation using a Global Correlation Network
with Discriminative Embedding [60.89561661441736]
We propose a novel method for few-shot medical image segmentation.
We construct our few-shot image segmentor using a deep convolutional network trained episodically.
We enhance discriminability of deep embedding to encourage clustering of the feature domains of the same class.
arXiv Detail & Related papers (2020-12-10T04:01:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.