Related papers: HistoViT: Vision Transformer for Accurate and Scalable Histopathological Cancer Diagnosis

HistoViT: Vision Transformer for Accurate and Scalable Histopathological Cancer Diagnosis

URL: http://arxiv.org/abs/2508.11181v1
Date: Fri, 15 Aug 2025 03:10:52 GMT
Title: HistoViT: Vision Transformer for Accurate and Scalable Histopathological Cancer Diagnosis
Authors: Faisal Ahmed,
Abstract summary: We propose a transformer-based deep learning framework for multi-class tumor classification.<n>Our method addresses key limitations of conventional convolutional neural networks.<n>Our approach classification achieves accuracies of 99.32%, 96.92%, 95.28%, and 96.94% for breast, prostate, bone, and cervical cancers respectively.
Score: 1.5939351525664014
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Accurate and scalable cancer diagnosis remains a critical challenge in modern pathology, particularly for malignancies such as breast, prostate, bone, and cervical, which exhibit complex histological variability. In this study, we propose a transformer-based deep learning framework for multi-class tumor classification in histopathological images. Leveraging a fine-tuned Vision Transformer (ViT) architecture, our method addresses key limitations of conventional convolutional neural networks, offering improved performance, reduced preprocessing requirements, and enhanced scalability across tissue types. To adapt the model for histopathological cancer images, we implement a streamlined preprocessing pipeline that converts tiled whole-slide images into PyTorch tensors and standardizes them through data normalization. This ensures compatibility with the ViT architecture and enhances both convergence stability and overall classification performance. We evaluate our model on four benchmark datasets: ICIAR2018 (breast), SICAPv2 (prostate), UT-Osteosarcoma (bone), and SipakMed (cervical) dataset -- demonstrating consistent outperformance over existing deep learning methods. Our approach achieves classification accuracies of 99.32%, 96.92%, 95.28%, and 96.94% for breast, prostate, bone, and cervical cancers respectively, with area under the ROC curve (AUC) scores exceeding 99% across all datasets. These results confirm the robustness, generalizability, and clinical potential of transformer-based architectures in digital pathology. Our work represents a significant advancement toward reliable, automated, and interpretable cancer diagnosis systems that can alleviate diagnostic burdens and improve healthcare outcomes.

Related papers

A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis [82.01597026329158]
We introduce a Correlation-Regulated Alignment Framework for Tissue Synthesis (CRAFTS) for pathology-specific text-to-image synthesis.<n>CRAFTS incorporates a novel alignment mechanism that suppresses semantic drift to ensure biological accuracy.<n>This model generates diverse pathological images spanning 30 cancer types, with quality rigorously validated by objective metrics and pathologist evaluations.
arXiv Detail & Related papers (2025-12-15T10:22:43Z)
Efficient Breast and Ovarian Cancer Classification via ViT-Based Preprocessing and Transfer Learning [0.7088460451473201]
We introduce a novel vision transformer (ViT)-based method for detecting and classifying breast and ovarian cancer.<n>We use a pre-trained ViT-Base-Patch16-224 model, which is fine-tuned for both binary and multi-class classification tasks.<n>Our model surpasses existing CNN, ViT, and topological data analysis-based approaches in binary classification.
arXiv Detail & Related papers (2025-09-23T02:25:44Z)
A Dual-Task Synergy-Driven Generalization Framework for Pancreatic Cancer Segmentation in CT Scans [10.62594407632477]
Pancreatic cancer, characterized by its notable prevalence and mortality rates, demands accurate lesion delineation.<n>We propose a generalization framework that synergizes pixel-level classification and regression tasks, to accurately delineate lesions.<n>Our model successfully improves the results of the highly challenging cross-lesion generalized pancreatic cancer segmentation task by 9.51%.
arXiv Detail & Related papers (2025-05-03T00:54:00Z)
Towards Accurate and Interpretable Neuroblastoma Diagnosis via Contrastive Multi-scale Pathological Image Analysis [16.268045905735818]
We propose CMSwinKAN, a contrastive-learning-based multi-scale feature fusion model tailored for pathological image classification.<n>By fusing multi-scale features and leveraging contrastive learning strategies, CMSwinKAN mimics clinicians' comprehensive approach.<n>Results demonstrate that CMSwinKAN performs better than existing state-of-the-art pathology-specific models pre-trained on large datasets.
arXiv Detail & Related papers (2025-04-18T15:39:46Z)
GS-TransUNet: Integrated 2D Gaussian Splatting and Transformer UNet for Accurate Skin Lesion Analysis [44.99833362998488]
We present a novel approach that combines 2D Gaussian splatting with the Transformer UNet architecture for automated skin cancer diagnosis.<n>Our findings illustrate significant advancements in the precision of segmentation and classification.<n>This integration sets new benchmarks in the field and highlights the potential for further research into multi-task medical image analysis methodologies.
arXiv Detail & Related papers (2025-02-23T23:28:47Z)
Evaluation of Vision Transformers for Multimodal Image Classification: A Case Study on Brain, Lung, and Kidney Tumors [0.0]
The work evaluates the performance of Vision Transformers architectures, including Swin Transformer and MaxViT, in several datasets of MRI and CT scans.<n>The results revealed that the Swin Transformer provided high accuracy, achieving up to 99% on average for individual datasets and 99.4% accuracy for the combined dataset.
arXiv Detail & Related papers (2025-02-08T10:35:51Z)
Synthetic CT image generation from CBCT: A Systematic Review [44.01505745127782]
Generation of synthetic CT (sCT) images from cone-beam CT (CBCT) data using deep learning methodologies represents a significant advancement in radiation oncology.<n>A total of 35 relevant studies were identified and analyzed, revealing the prevalence of deep learning approaches in the generation of sCT.
arXiv Detail & Related papers (2025-01-22T13:54:07Z)
Advanced Hybrid Deep Learning Model for Enhanced Classification of Osteosarcoma Histopathology Images [0.0]
This study focuses on osteosarcoma (OS), the most common bone cancer in children and adolescents, which affects the long bones of the arms and legs. We propose a novel hybrid model that combines convolutional neural networks (CNN) and vision transformers (ViT) to improve diagnostic accuracy for OS. The model achieved an accuracy of 99.08%, precision of 99.10%, recall of 99.28%, and an F1-score of 99.23%.
arXiv Detail & Related papers (2024-10-29T13:54:08Z)
Classification of lung cancer subtypes on CT images with synthetic pathological priors [41.75054301525535]
Cross-scale associations exist in the image patterns between the same case's CT images and its pathological images. We propose self-generating hybrid feature network (SGHF-Net) for accurately classifying lung cancer subtypes on CT images.
arXiv Detail & Related papers (2023-08-09T02:04:05Z)
Meta-information-aware Dual-path Transformer for Differential Diagnosis of Multi-type Pancreatic Lesions in Multi-phase CT [41.199716328468895]
We develop a dual-path transformer to exploit the feasibility of classification and segmentation of pancreatic lesions. The proposed method consists of a CNN-based segmentation path (S-path) and a transformer-based classification path (C-path) Our results show that our method can enable accurate classification and segmentation of the full taxonomy of pancreatic lesions.
arXiv Detail & Related papers (2023-03-02T03:34:28Z)
Co-Heterogeneous and Adaptive Segmentation from Multi-Source and Multi-Phase CT Imaging Data: A Study on Pathological Liver and Lesion Segmentation [48.504790189796836]
We present a novel segmentation strategy, co-heterogenous and adaptive segmentation (CHASe) We propose a versatile framework that fuses appearance based semi-supervision, mask based adversarial domain adaptation, and pseudo-labeling. CHASe can further improve pathological liver mask Dice-Sorensen coefficients by ranges of $4.2% sim 9.4%$.
arXiv Detail & Related papers (2020-05-27T06:58:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.