Related papers: When CNNs Outperform Transformers and Mambas: Revisiting Deep Architectures for Dental Caries Segmentation

When CNNs Outperform Transformers and Mambas: Revisiting Deep Architectures for Dental Caries Segmentation

URL: http://arxiv.org/abs/2511.14860v1
Date: Tue, 18 Nov 2025 19:16:21 GMT
Title: When CNNs Outperform Transformers and Mambas: Revisiting Deep Architectures for Dental Caries Segmentation
Authors: Aashish Ghimire, Jun Zeng, Roshan Paudel, Nikhil Kumar Tomar, Deepak Ranjan Nayak, Harshith Reddy Nalla, Vivek Jha, Glenda Reynolds, Debesh Jha,
Abstract summary: We present the first comprehensive benchmarking of convolutional neural networks, vision transformers and state-space mamba architectures for automated dental caries segmentation on panoramic radiographs through a DC1000 dataset.<n>Results reveal that, contrary to the growing trend toward complex attention based architectures, the CNN-based DoubleU-Net achieved the highest dice coefficient of 0.7345, mIoU of 0.5978, and precision of 0.8145, outperforming all transformer and Mamba variants.
Score: 9.108764893521526
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Accurate identification and segmentation of dental caries in panoramic radiographs are critical for early diagnosis and effective treatment planning. Automated segmentation remains challenging due to low lesion contrast, morphological variability, and limited annotated data. In this study, we present the first comprehensive benchmarking of convolutional neural networks, vision transformers and state-space mamba architectures for automated dental caries segmentation on panoramic radiographs through a DC1000 dataset. Twelve state-of-the-art architectures, including VMUnet, MambaUNet, VMUNetv2, RMAMamba-S, TransNetR, PVTFormer, DoubleU-Net, and ResUNet++, were trained under identical configurations. Results reveal that, contrary to the growing trend toward complex attention based architectures, the CNN-based DoubleU-Net achieved the highest dice coefficient of 0.7345, mIoU of 0.5978, and precision of 0.8145, outperforming all transformer and Mamba variants. In the study, the top 3 results across all performance metrics were achieved by CNN-based architectures. Here, Mamba and transformer-based methods, despite their theoretical advantage in global context modeling, underperformed due to limited data and weaker spatial priors. These findings underscore the importance of architecture-task alignment in domain-specific medical image segmentation more than model complexity. Our code is available at: https://github.com/JunZengz/dental-caries-segmentation.

Related papers

When Swin Transformer Meets KANs: An Improved Transformer Architecture for Medical Image Segmentation [10.656996937993199]
We introduce UKAST, a U-Net like architecture that integrates rational-function based Kolmogorov-Arnold Networks (KANs) into Swin Transformer encoders.<n>UKAST achieves state-of-the-art performance on four diverse 2D and 3D medical image segmentation benchmarks.
arXiv Detail & Related papers (2025-11-06T05:44:57Z)
Performance Analysis of Deep Learning Models for Femur Segmentation in MRI Scan [5.5193366921929155]
We evaluate and compare the performance of three CNN-based models, i.e., U-Net, Attention U-Net, and U-KAN, and one transformer-based model, SAM 2.<n>The dataset comprises 11,164 MRI scans with detailed annotations of femoral regions.<n> Attention U-Net achieves the highest overall scores, while U-KAN demonstrated superior performance in anatomical regions with a smaller region of interest.
arXiv Detail & Related papers (2025-04-05T05:47:56Z)
MambaClinix: Hierarchical Gated Convolution and Mamba-Based U-Net for Enhanced 3D Medical Image Segmentation [6.673169053236727]
We propose MambaClinix, a novel U-shaped architecture for medical image segmentation. MambaClinix integrates a hierarchical gated convolutional network with Mamba in an adaptive stage-wise framework. Our results show that MambaClinix achieves high segmentation accuracy while maintaining low model complexity.
arXiv Detail & Related papers (2024-09-19T07:51:14Z)
Deep models for stroke segmentation: do complex architectures always perform better? [1.4651272514940197]
Stroke segmentation plays a crucial role in the diagnosis and treatment of stroke patients. Deep models have been introduced for general medical image segmentation. In this study, we selected four types of deep models that were recently proposed and evaluated their performance for stroke segmentation.
arXiv Detail & Related papers (2024-03-25T20:44:01Z)
Leveraging Frequency Domain Learning in 3D Vessel Segmentation [50.54833091336862]
In this study, we leverage Fourier domain learning as a substitute for multi-scale convolutional kernels in 3D hierarchical segmentation models. We show that our novel network achieves remarkable dice performance (84.37% on ASACA500 and 80.32% on ImageCAS) in tubular vessel segmentation tasks.
arXiv Detail & Related papers (2024-01-11T19:07:58Z)
Brain Tumor Radiogenomic Classification [1.8276368987462532]
The RSNA-MICCAI brain tumor radiogenomic classification challenge aimed to predict MGMT biomarker status in glioblastoma through binary classification. The dataset is splitted into three main cohorts: training set, validation set which were used during training, and the testing were only used during final evaluation.
arXiv Detail & Related papers (2024-01-11T10:30:09Z)
Breast Ultrasound Tumor Classification Using a Hybrid Multitask CNN-Transformer Network [63.845552349914186]
Capturing global contextual information plays a critical role in breast ultrasound (BUS) image classification. Vision Transformers have an improved capability of capturing global contextual information but may distort the local image patterns due to the tokenization operations. In this study, we proposed a hybrid multitask deep neural network called Hybrid-MT-ESTAN, designed to perform BUS tumor classification and segmentation.
arXiv Detail & Related papers (2023-08-04T01:19:32Z)
AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images [53.29794593104923]
We present a novel concept of shared-context processing for whole slide histopathology images. AMIGO uses the celluar graph within the tissue to provide a single representation for a patient. We show that our model is strongly robust to missing information to an extent that it can achieve the same performance with as low as 20% of the data.
arXiv Detail & Related papers (2023-03-01T23:37:45Z)
InDuDoNet+: A Model-Driven Interpretable Dual Domain Network for Metal Artifact Reduction in CT Images [53.4351366246531]
We construct a novel interpretable dual domain network, termed InDuDoNet+, into which CT imaging process is finely embedded. We analyze the CT values among different tissues, and merge the prior observations into a prior network for our InDuDoNet+, which significantly improve its generalization performance.
arXiv Detail & Related papers (2021-12-23T15:52:37Z)
Vision Transformers for femur fracture classification [59.99241204074268]
The Vision Transformer (ViT) was able to correctly predict 83% of the test images. Good results were obtained in sub-fractures with the largest and richest dataset ever.
arXiv Detail & Related papers (2021-08-07T10:12:42Z)
Automatic size and pose homogenization with spatial transformer network to improve and accelerate pediatric segmentation [51.916106055115755]
We propose a new CNN architecture that is pose and scale invariant thanks to the use of Spatial Transformer Network (STN) Our architecture is composed of three sequential modules that are estimated together during training. We test the proposed method in kidney and renal tumor segmentation on abdominal pediatric CT scanners.
arXiv Detail & Related papers (2021-07-06T14:50:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.