Hierarchical Transformer for Survival Prediction Using Multimodality
Whole Slide Images and Genomics
- URL: http://arxiv.org/abs/2211.16632v1
- Date: Tue, 29 Nov 2022 23:47:56 GMT
- Title: Hierarchical Transformer for Survival Prediction Using Multimodality
Whole Slide Images and Genomics
- Authors: Chunyuan Li, Xinliang Zhu, Jiawen Yao and Junzhou Huang
- Abstract summary: Learning good representation of giga-pixel level whole slide pathology images (WSI) for downstream tasks is critical.
This paper proposes a hierarchical-based multimodal transformer framework that learns a hierarchical mapping between pathology images and corresponding genes.
Our architecture requires fewer GPU resources compared with benchmark methods while maintaining better WSI representation ability.
- Score: 63.76637479503006
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning good representation of giga-pixel level whole slide pathology images
(WSI) for downstream tasks is critical. Previous studies employ multiple
instance learning (MIL) to represent WSIs as bags of sampled patches because,
for most occasions, only slide-level labels are available, and only a tiny
region of the WSI is disease-positive area. However, WSI representation
learning still remains an open problem due to: (1) patch sampling on a higher
resolution may be incapable of depicting microenvironment information such as
the relative position between the tumor cells and surrounding tissues, while
patches at lower resolution lose the fine-grained detail; (2) extracting
patches from giant WSI results in large bag size, which tremendously increases
the computational cost. To solve the problems, this paper proposes a
hierarchical-based multimodal transformer framework that learns a hierarchical
mapping between pathology images and corresponding genes. Precisely, we
randomly extract instant-level patch features from WSIs with different
magnification. Then a co-attention mapping between imaging and genomics is
learned to uncover the pairwise interaction and reduce the space complexity of
imaging features. Such early fusion makes it computationally feasible to use
MIL Transformer for the survival prediction task. Our architecture requires
fewer GPU resources compared with benchmark methods while maintaining better
WSI representation ability. We evaluate our approach on five cancer types from
the Cancer Genome Atlas database and achieved an average c-index of $0.673$,
outperforming the state-of-the-art multimodality methods.
Related papers
- MamMIL: Multiple Instance Learning for Whole Slide Images with State Space Models [56.37780601189795]
We propose a framework named MamMIL for WSI analysis.
We represent each WSI as an undirected graph.
To address the problem that Mamba can only process 1D sequences, we propose a topology-aware scanning mechanism.
arXiv Detail & Related papers (2024-03-08T09:02:13Z) - A self-supervised framework for learning whole slide representations [52.774822784847565]
We present Slide Pre-trained Transformers (SPT) for gigapixel-scale self-supervision of whole slide images.
We benchmark SPT visual representations on five diagnostic tasks across three biomedical microscopy datasets.
arXiv Detail & Related papers (2024-02-09T05:05:28Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - Gene-induced Multimodal Pre-training for Image-omic Classification [20.465959546613554]
This paper proposes a Gene-induced Multimodal Pre-training framework, which jointly incorporates genomics and Whole Slide Images (WSIs) for classification tasks.
Experimental results on the TCGA dataset show the superiority of our network architectures and our pre-training framework, achieving 99.47% in accuracy for image-omic classification.
arXiv Detail & Related papers (2023-09-06T04:30:15Z) - BEL: A Bag Embedding Loss for Transformer enhances Multiple Instance
Whole Slide Image Classification [39.53132774980783]
Bag Embedding Loss (BEL) forces the model to learn a discriminative bag-level representation by minimizing the distance between bag embeddings of the same class and maximizing the distance between different classes.
We show that with BEL, TransMIL outperforms the baseline models on both datasets.
arXiv Detail & Related papers (2023-03-02T16:02:55Z) - AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context
Processing for Representation Learning of Giga-pixel Images [53.29794593104923]
We present a novel concept of shared-context processing for whole slide histopathology images.
AMIGO uses the celluar graph within the tissue to provide a single representation for a patient.
We show that our model is strongly robust to missing information to an extent that it can achieve the same performance with as low as 20% of the data.
arXiv Detail & Related papers (2023-03-01T23:37:45Z) - Coarse-to-Fine Sparse Transformer for Hyperspectral Image Reconstruction [138.04956118993934]
We propose a novel Transformer-based method, coarse-to-fine sparse Transformer (CST)
CST embedding HSI sparsity into deep learning for HSI reconstruction.
In particular, CST uses our proposed spectra-aware screening mechanism (SASM) for coarse patch selecting. Then the selected patches are fed into our customized spectra-aggregation hashing multi-head self-attention (SAH-MSA) for fine pixel clustering and self-similarity capturing.
arXiv Detail & Related papers (2022-03-09T16:17:47Z) - An Efficient Cervical Whole Slide Image Analysis Framework Based on
Multi-scale Semantic and Spatial Features using Deep Learning [2.7218168309244652]
This study designs a novel inline connection network (InCNet) by enriching the multi-scale connectivity to build the lightweight model named You Only Look Cytopathology Once (YOLCO)
The proposed model allows the input size enlarged to megapixel that can stitch the WSI without any overlap by the average repeats.
Based on Transformer for classifying the integrated multi-scale multi-task features, the experimental results appear $0.872$ AUC score better and $2.51times$ faster than the best conventional method in WSI classification.
arXiv Detail & Related papers (2021-06-29T06:24:55Z) - An End-to-End Breast Tumour Classification Model Using Context-Based
Patch Modelling- A BiLSTM Approach for Image Classification [19.594639581421422]
We have tried to integrate this relationship along with feature-based correlation among the extracted patches from the particular tumorous region.
We trained and tested our model on two datasets, microscopy images and WSI tumour regions.
We found out that BiLSTMs with CNN features have performed much better in modelling patches into an end-to-end Image classification network.
arXiv Detail & Related papers (2021-06-05T10:43:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.