Vision Transformers with Autoencoders and Explainable AI for Cancer Patient Risk Stratification Using Whole Slide Imaging
- URL: http://arxiv.org/abs/2504.04749v2
- Date: Tue, 08 Apr 2025 03:59:22 GMT
- Title: Vision Transformers with Autoencoders and Explainable AI for Cancer Patient Risk Stratification Using Whole Slide Imaging
- Authors: Ahmad Hussein, Mukesh Prasad, Ali Anaissi, Ali Braytee,
- Abstract summary: PATH-X is a framework that integrates Vision Transformers (ViT) and Autoencoders with SHAP (Shapley Additive Explanations) to enhance modelability for patient stratification and risk prediction.<n>A representative image slice is selected from each WSI, and numerical feature embeddings are extracted using Google's pre-trained ViT.<n>Kaplan-Meier survival analysis is applied to evaluate stratification into two and three risk groups.
- Score: 3.6940298700319065
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cancer remains one of the leading causes of mortality worldwide, necessitating accurate diagnosis and prognosis. Whole Slide Imaging (WSI) has become an integral part of clinical workflows with advancements in digital pathology. While various studies have utilized WSIs, their extracted features may not fully capture the most relevant pathological information, and their lack of interpretability limits clinical adoption. In this paper, we propose PATH-X, a framework that integrates Vision Transformers (ViT) and Autoencoders with SHAP (Shapley Additive Explanations) to enhance model explainability for patient stratification and risk prediction using WSIs from The Cancer Genome Atlas (TCGA). A representative image slice is selected from each WSI, and numerical feature embeddings are extracted using Google's pre-trained ViT. These features are then compressed via an autoencoder and used for unsupervised clustering and classification tasks. Kaplan-Meier survival analysis is applied to evaluate stratification into two and three risk groups. SHAP is used to identify key contributing features, which are mapped onto histopathological slices to provide spatial context. PATH-X demonstrates strong performance in breast and glioma cancers, where a sufficient number of WSIs enabled robust stratification. However, performance in lung cancer was limited due to data availability, emphasizing the need for larger datasets to enhance model reliability and clinical applicability.
Related papers
- Towards Accurate and Interpretable Neuroblastoma Diagnosis via Contrastive Multi-scale Pathological Image Analysis [16.268045905735818]
CMSwinKAN is a contrastive-learning-based multi-scale feature fusion model tailored for pathological image classification.
We introduce a soft voting mechanism guided by clinical insights to seamlessly bridge patch-level predictions to whole slide image-level classifications.
Results demonstrate that CMSwinKAN performs better than existing state-of-the-art pathology-specific models pre-trained on large datasets.
arXiv Detail & Related papers (2025-04-18T15:39:46Z) - MIL vs. Aggregation: Evaluating Patient-Level Survival Prediction Strategies Using Graph-Based Learning [52.231128973251124]
We compare various strategies for predicting survival at the WSI and patient level.<n>The former treats each WSI as an independent sample, mimicking the strategy adopted in other works.<n>The latter comprises methods to either aggregate the predictions of the several WSIs or automatically identify the most relevant slide.
arXiv Detail & Related papers (2025-03-29T11:14:02Z) - From Pixels to Histopathology: A Graph-Based Framework for Interpretable Whole Slide Image Analysis [81.19923502845441]
We develop a graph-based framework that constructs WSI graph representations.
We build tissue representations (nodes) that follow biological boundaries rather than arbitrary patches.
In our method's final step, we solve the diagnostic task through a graph attention network.
arXiv Detail & Related papers (2025-03-14T20:15:04Z) - Robust Multimodal Survival Prediction with the Latent Differentiation Conditional Variational AutoEncoder [18.519138120118125]
We propose a Conditional Latent Differentiation Variational AutoEncoder (LD-CVAE) for robust multimodal survival prediction.<n>Specifically, a Variational Information Bottleneck Transformer (VIB-Trans) module is proposed to learn compressed pathological representations from the gigapixel WSIs.<n>We develop a novel Latent Differentiation Variational AutoEncoder (LD-VAE) to learn the common and specific posteriors for the genomic embeddings with diverse functions.
arXiv Detail & Related papers (2025-03-12T15:58:37Z) - Towards a Benchmark for Colorectal Cancer Segmentation in Endorectal Ultrasound Videos: Dataset and Model Development [59.74920439478643]
In this paper, we collect and annotated the first benchmark dataset that covers diverse ERUS scenarios.
Our ERUS-10K dataset comprises 77 videos and 10,000 high-resolution annotated frames.
We introduce a benchmark model for colorectal cancer segmentation, named the Adaptive Sparse-context TRansformer (ASTR)
arXiv Detail & Related papers (2024-08-19T15:04:42Z) - Large-scale cervical precancerous screening via AI-assisted cytology whole slide image analysis [11.148919818020495]
Cervical Cancer continues to be the leading gynecological malignancy, posing a persistent threat to women's health on a global scale.
Early screening via Whole Slide Image (WSI) diagnosis is critical to prevent this Cancer progression and improve survival rate.
But pathologist's single test suffers inevitable false negative due to the immense number of cells that need to be reviewed within a WSI.
arXiv Detail & Related papers (2024-07-28T15:29:07Z) - MM-SurvNet: Deep Learning-Based Survival Risk Stratification in Breast
Cancer Through Multimodal Data Fusion [18.395418853966266]
We propose a novel deep learning approach for breast cancer survival risk stratification.
We employ vision transformers, specifically the MaxViT model, for image feature extraction, and self-attention to capture intricate image relationships at the patient level.
A dual cross-attention mechanism fuses these features with genetic data, while clinical data is incorporated at the final layer to enhance predictive accuracy.
arXiv Detail & Related papers (2024-02-19T02:31:36Z) - A self-supervised framework for learning whole slide representations [52.774822784847565]
We present Slide Pre-trained Transformers (SPT) for gigapixel-scale self-supervision of whole slide images.
We benchmark SPT visual representations on five diagnostic tasks across three biomedical microscopy datasets.
arXiv Detail & Related papers (2024-02-09T05:05:28Z) - Context-Aware Self-Supervised Learning of Whole Slide Images [0.0]
A novel two-stage learning technique is presented in this work.
A graph representation capturing all dependencies among regions in the WSI is very intuitive.
The entire slide is presented as a graph, where the nodes correspond to the patches from the WSI.
The proposed framework is then tested using WSIs from prostate and kidney cancers.
arXiv Detail & Related papers (2023-06-07T20:23:05Z) - Hierarchical Transformer for Survival Prediction Using Multimodality
Whole Slide Images and Genomics [63.76637479503006]
Learning good representation of giga-pixel level whole slide pathology images (WSI) for downstream tasks is critical.
This paper proposes a hierarchical-based multimodal transformer framework that learns a hierarchical mapping between pathology images and corresponding genes.
Our architecture requires fewer GPU resources compared with benchmark methods while maintaining better WSI representation ability.
arXiv Detail & Related papers (2022-11-29T23:47:56Z) - Lung Cancer Lesion Detection in Histopathology Images Using Graph-Based
Sparse PCA Network [93.22587316229954]
We propose a graph-based sparse principal component analysis (GS-PCA) network, for automated detection of cancerous lesions on histological lung slides stained by hematoxylin and eosin (H&E)
We evaluate the performance of the proposed algorithm on H&E slides obtained from an SVM K-rasG12D lung cancer mouse model using precision/recall rates, F-score, Tanimoto coefficient, and area under the curve (AUC) of the receiver operator characteristic (ROC)
arXiv Detail & Related papers (2021-10-27T19:28:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.