RS-CA-HSICT: A Residual and Spatial Channel Augmented CNN Transformer Framework for Monkeypox Detection
- URL: http://arxiv.org/abs/2511.15476v1
- Date: Wed, 19 Nov 2025 14:32:34 GMT
- Title: RS-CA-HSICT: A Residual and Spatial Channel Augmented CNN Transformer Framework for Monkeypox Detection
- Authors: Rashid Iqbal, Saddam Hussain Khan,
- Abstract summary: This work proposes a hybrid deep learning approach, namely Residual and Spatial Learning based Channel Augmented Integrated CNN-Transformer architecture.<n>The proposed RS-CA-HSICT framework is composed of an HSICT block, a residual CNN module, a spatial CNN block, and a CA.<n> Experimental results on both the Kaggle benchmark and a diverse MPox dataset reported classification accuracy as high as 98.30% and an F1-score of 98.13%, which outperforms the existing CNNs and ViTs.
- Score: 0.7857499581522376
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This work proposes a hybrid deep learning approach, namely Residual and Spatial Learning based Channel Augmented Integrated CNN-Transformer architecture, that leverages the strengths of CNN and Transformer towards enhanced MPox detection. The proposed RS-CA-HSICT framework is composed of an HSICT block, a residual CNN module, a spatial CNN block, and a CA, which enhances the diverse feature space, detailed lesion information, and long-range dependencies. The new HSICT module first integrates an abstract representation of the stem CNN and customized ICT blocks for efficient multihead attention and structured CNN layers with homogeneous (H) and structural (S) operations. The customized ICT blocks learn global contextual interactions and local texture extraction. Additionally, H and S layers learn spatial homogeneity and fine structural details by reducing noise and modeling complex morphological variations. Moreover, inverse residual learning enhances vanishing gradient, and stage-wise resolution reduction ensures scale invariance. Furthermore, the RS-CA-HSICT framework augments the learned HSICT channels with the TL-driven Residual and Spatial CNN maps for enhanced multiscale feature space capturing global and localized structural cues, subtle texture, and contrast variations. These channels, preceding augmentation, are refined through the Channel-Fusion-and-Attention block, which preserves discriminative channels while suppressing redundant ones, thereby enabling efficient computation. Finally, the spatial attention mechanism refines pixel selection to detect subtle patterns and intra-class contrast variations in Mpox. Experimental results on both the Kaggle benchmark and a diverse MPox dataset reported classification accuracy as high as 98.30% and an F1-score of 98.13%, which outperforms the existing CNNs and ViTs.
Related papers
- Residual-SwinCA-Net: A Channel-Aware Integrated Residual CNN-Swin Transformer for Malignant Lesion Segmentation in BUSI [1.759752510445115]
A novel deep hybrid Residual-SwinCA-Net segmentation framework is proposed in the study.<n>For learning global dependencies, Swin Transformer blocks are customized using internal residual pathways.<n>The Residual-SwinCA-Net and existing CNNs/ViTs techniques have been implemented on the publicly available BUSI dataset.
arXiv Detail & Related papers (2025-12-09T04:52:30Z) - CE-RS-SBCIT A Novel Channel Enhanced Hybrid CNN Transformer with Residual, Spatial, and Boundary-Aware Learning for Brain Tumor MRI Analysis [0.7857499581522376]
The framework exploits local fine-grained and global contextual cues through four core innovations.<n>The framework achieves 98.30% accuracy, 98.08% sensitivity, 98.25% F1-score, and 98.43% precision.
arXiv Detail & Related papers (2025-08-23T20:09:39Z) - A Novel Channel Boosted Residual CNN-Transformer with Regional-Boundary Learning for Breast Cancer Detection [6.6149904114661]
This study introduces a novel hybrid framework, CB-Res-RBCMT, combining customized residual CNNs and new ViT components for detailed BUSI cancer analysis.<n>The proposed RBCMT uses stem convolution blocks with CNN Meet Transformer (CMT) blocks, followed by new Regional and boundary (RB) feature extraction operations.<n>The proposed CB-Res-RBCMT achieves an F1-score of 95.57%, accuracy of 95.63%, sensitivity of 96.42%, and precision of 94.79% on the standard harmonized stringent BUSI dataset.
arXiv Detail & Related papers (2025-03-19T08:59:02Z) - CSHNet: A Novel Information Asymmetric Image Translation Method [57.22010952287759]
We propose the CNN-Swin Hybrid Network (CSHNet), which combines two key modules: Swin Embedded CNN (SEC) and CNN Embedded Swin (CES)<n>CSHNet outperforms existing methods in both visual quality and performance metrics across scene-level and instance-level datasets.
arXiv Detail & Related papers (2025-01-17T13:44:54Z) - CNN-Transformer Rectified Collaborative Learning for Medical Image Segmentation [60.08541107831459]
This paper proposes a CNN-Transformer rectified collaborative learning framework to learn stronger CNN-based and Transformer-based models for medical image segmentation.
Specifically, we propose a rectified logit-wise collaborative learning (RLCL) strategy which introduces the ground truth to adaptively select and rectify the wrong regions in student soft labels.
We also propose a class-aware feature-wise collaborative learning (CFCL) strategy to achieve effective knowledge transfer between CNN-based and Transformer-based models in the feature space.
arXiv Detail & Related papers (2024-08-25T01:27:35Z) - A Novel Feature Map Enhancement Technique Integrating Residual CNN and Transformer for Alzheimer Diseases Diagnosis [1.2432046687586285]
Alzheimer diseases (ADs) involves cognitive decline and abnormal brain protein accumulation, necessitating timely diagnosis for effective treatment.
CAD systems leveraging deep learning advancements have demonstrated success in AD detection but pose computational intricacies and the dataset minor contrast, structural, and texture variations.
This approach integrates three distinct elements: a novel CNN Meet Transformer (HSCMT), customized residual learning CNN, and a new Feature Map Enhancement (FME) strategy to learn diverse morphological, contrast, and texture variations of ADs.
arXiv Detail & Related papers (2024-03-30T10:17:13Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - A heterogeneous group CNN for image super-resolution [127.2132400582117]
Convolutional neural networks (CNNs) have obtained remarkable performance via deep architectures.
We present a heterogeneous group SR CNN (HGSRCNN) via leveraging structure information of different types to obtain a high-quality image.
arXiv Detail & Related papers (2022-09-26T04:14:59Z) - Image Super-resolution with An Enhanced Group Convolutional Neural
Network [102.2483249598621]
CNNs with strong learning abilities are widely chosen to resolve super-resolution problem.
We present an enhanced super-resolution group CNN (ESRGCNN) with a shallow architecture.
Experiments report that our ESRGCNN surpasses the state-of-the-arts in terms of SISR performance, complexity, execution speed, image quality evaluation and visual effect in SISR.
arXiv Detail & Related papers (2022-05-29T00:34:25Z) - Learning A 3D-CNN and Transformer Prior for Hyperspectral Image
Super-Resolution [80.93870349019332]
We propose a novel HSISR method that uses Transformer instead of CNN to learn the prior of HSIs.
Specifically, we first use the gradient algorithm to solve the HSISR model, and then use an unfolding network to simulate the iterative solution processes.
arXiv Detail & Related papers (2021-11-27T15:38:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.