HVTSurv: Hierarchical Vision Transformer for Patient-Level Survival
Prediction from Whole Slide Image
- URL: http://arxiv.org/abs/2306.17373v1
- Date: Fri, 30 Jun 2023 02:26:49 GMT
- Title: HVTSurv: Hierarchical Vision Transformer for Patient-Level Survival
Prediction from Whole Slide Image
- Authors: Zhuchen Shao, Yang Chen, Hao Bian, Jian Zhang, Guojun Liu, Yongbing
Zhang
- Abstract summary: Survival prediction based on whole slide images (WSIs) is a challenging task for patient-level multiple instance learning (MIL)
We propose a hierarchical vision Transformer framework named HVTSurv, which can encode the local-level relative spatial information.
We validate HVTSurv with 3,104 patients and 3,752 WSIs across 6 cancer types from The Cancer Genome Atlas (TC)
- Score: 13.100966504814604
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Survival prediction based on whole slide images (WSIs) is a challenging task
for patient-level multiple instance learning (MIL). Due to the vast amount of
data for a patient (one or multiple gigapixels WSIs) and the irregularly shaped
property of WSI, it is difficult to fully explore spatial, contextual, and
hierarchical interaction in the patient-level bag. Many studies adopt random
sampling pre-processing strategy and WSI-level aggregation models, which
inevitably lose critical prognostic information in the patient-level bag. In
this work, we propose a hierarchical vision Transformer framework named
HVTSurv, which can encode the local-level relative spatial information,
strengthen WSI-level context-aware communication, and establish patient-level
hierarchical interaction. Firstly, we design a feature pre-processing strategy,
including feature rearrangement and random window masking. Then, we devise
three layers to progressively obtain patient-level representation, including a
local-level interaction layer adopting Manhattan distance, a WSI-level
interaction layer employing spatial shuffle, and a patient-level interaction
layer using attention pooling. Moreover, the design of hierarchical network
helps the model become more computationally efficient. Finally, we validate
HVTSurv with 3,104 patients and 3,752 WSIs across 6 cancer types from The
Cancer Genome Atlas (TCGA). The average C-Index is 2.50-11.30% higher than all
the prior weakly supervised methods over 6 TCGA datasets. Ablation study and
attention visualization further verify the superiority of the proposed HVTSurv.
Implementation is available at: https://github.com/szc19990412/HVTSurv.
Related papers
- Rethinking Transformer for Long Contextual Histopathology Whole Slide Image Analysis [9.090504201460817]
Histo Whole Slide Image (WSI) analysis serves as the gold standard for clinical cancer diagnosis in the daily routines of doctors.
Previous methods typically employ Multi-pathology Learning to enable slide-level prediction given only slide-level labels.
To alleviate the computational complexity of long sequences in large WSIs, methods like HIPT use region-slicing, and TransMIL employs approximation of full self-attention.
arXiv Detail & Related papers (2024-10-18T06:12:36Z) - Combining Graph Neural Network and Mamba to Capture Local and Global Tissue Spatial Relationships in Whole Slide Images [1.1813933389519358]
In computational pathology, extracting spatial features from gigapixel whole slide images (WSIs) is a fundamental task.
We introduce a model that combines a message-passing graph neural network (GNN) with a state space model (Mamba) to capture both local and global spatial relationships.
The model's effectiveness was demonstrated in predicting progression-free survival among patients with early-stage lung adenocarcinomas.
arXiv Detail & Related papers (2024-06-05T22:06:57Z) - S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR)
Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection.
In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z) - BEL: A Bag Embedding Loss for Transformer enhances Multiple Instance
Whole Slide Image Classification [39.53132774980783]
Bag Embedding Loss (BEL) forces the model to learn a discriminative bag-level representation by minimizing the distance between bag embeddings of the same class and maximizing the distance between different classes.
We show that with BEL, TransMIL outperforms the baseline models on both datasets.
arXiv Detail & Related papers (2023-03-02T16:02:55Z) - Hierarchical Transformer for Survival Prediction Using Multimodality
Whole Slide Images and Genomics [63.76637479503006]
Learning good representation of giga-pixel level whole slide pathology images (WSI) for downstream tasks is critical.
This paper proposes a hierarchical-based multimodal transformer framework that learns a hierarchical mapping between pathology images and corresponding genes.
Our architecture requires fewer GPU resources compared with benchmark methods while maintaining better WSI representation ability.
arXiv Detail & Related papers (2022-11-29T23:47:56Z) - Patient-level Microsatellite Stability Assessment from Whole Slide
Images By Combining Momentum Contrast Learning and Group Patch Embeddings [6.40476282000118]
Current approaches bypass the WSI high resolution by first classifying small patches extracted from the WSI.
We introduce an effective approach to leverage WSI high resolution information by momentum contrastive learning of patch embeddings.
Our approach achieves up to 7.4% better accuracy compared to the straightforward patch-level classification and patient level aggregation approach.
arXiv Detail & Related papers (2022-08-22T16:31:43Z) - GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot
Learning [55.79997930181418]
Generalized Zero-Shot Learning aims to recognize images from both the seen and unseen classes by transferring semantic knowledge from seen to unseen classes.
It is a promising solution to take the advantage of generative models to hallucinate realistic unseen samples based on the knowledge learned from the seen classes.
We propose a novel flow-based generative framework that consists of multiple conditional affine coupling layers for learning unseen data generation.
arXiv Detail & Related papers (2022-07-05T04:04:37Z) - Two-Stream Graph Convolutional Network for Intra-oral Scanner Image
Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes.
Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z) - Learning A 3D-CNN and Transformer Prior for Hyperspectral Image
Super-Resolution [80.93870349019332]
We propose a novel HSISR method that uses Transformer instead of CNN to learn the prior of HSIs.
Specifically, we first use the gradient algorithm to solve the HSISR model, and then use an unfolding network to simulate the iterative solution processes.
arXiv Detail & Related papers (2021-11-27T15:38:57Z) - Hyperspectral Classification Based on Lightweight 3-D-CNN With Transfer
Learning [67.40866334083941]
We propose an end-to-end 3-D lightweight convolutional neural network (CNN) for limited samples-based HSI classification.
Compared with conventional 3-D-CNN models, the proposed 3-D-LWNet has a deeper network structure, less parameters, and lower computation cost.
Our model achieves competitive performance for HSI classification compared to several state-of-the-art methods.
arXiv Detail & Related papers (2020-12-07T03:44:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.