M2ORT: Many-To-One Regression Transformer for Spatial Transcriptomics
Prediction from Histopathology Images
- URL: http://arxiv.org/abs/2401.10608v2
- Date: Wed, 24 Jan 2024 13:47:33 GMT
- Title: M2ORT: Many-To-One Regression Transformer for Spatial Transcriptomics
Prediction from Histopathology Images
- Authors: Hongyi Wang, Xiuju Du, Jing Liu, Shuyi Ouyang, Yen-Wei Chen, Lanfen
Lin
- Abstract summary: M2ORT is a many-to-one regression Transformer that can accommodate the hierarchical structure of pathology images.
We have tested M2ORT on three public ST datasets and the experimental results show that M2ORT can achieve state-of-the-art performance.
- Score: 17.158450092707042
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The advancement of Spatial Transcriptomics (ST) has facilitated the
spatially-aware profiling of gene expressions based on histopathology images.
Although ST data offers valuable insights into the micro-environment of tumors,
its acquisition cost remains expensive. Therefore, directly predicting the ST
expressions from digital pathology images is desired. Current methods usually
adopt existing regression backbones for this task, which ignore the inherent
multi-scale hierarchical data structure of digital pathology images. To address
this limit, we propose M2ORT, a many-to-one regression Transformer that can
accommodate the hierarchical structure of the pathology images through a
decoupled multi-scale feature extractor. Different from traditional models that
are trained with one-to-one image-label pairs, M2ORT accepts multiple pathology
images of different magnifications at a time to jointly predict the gene
expressions at their corresponding common ST spot, aiming at learning a
many-to-one relationship through training. We have tested M2ORT on three public
ST datasets and the experimental results show that M2ORT can achieve
state-of-the-art performance with fewer parameters and floating-point
operations (FLOPs). The code is available at:
https://github.com/Dootmaan/M2ORT/.
Related papers
- M2OST: Many-to-one Regression for Predicting Spatial Transcriptomics from Digital Pathology Images [16.19308597273405]
We propose M2OST, a many-to-one regression Transformer that can accommodate the hierarchical structure of pathology images.
Unlike traditional models that are trained with one-to-one image-label pairs, M2OST uses multiple images from different levels of the digital pathology image to jointly predict the gene expressions in their common corresponding spot.
M2OST can achieve state-of-the-art performance with fewer parameters and floating-point operations (FLOPs)
arXiv Detail & Related papers (2024-09-23T15:06:37Z) - Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training [99.2891802841936]
We introduce the Med-ST framework for fine-grained spatial and temporal modeling.
For spatial modeling, Med-ST employs the Mixture of View Expert (MoVE) architecture to integrate different visual features from both frontal and lateral views.
For temporal modeling, we propose a novel cross-modal bidirectional cycle consistency objective by forward mapping classification (FMC) and reverse mapping regression (RMR)
arXiv Detail & Related papers (2024-05-30T03:15:09Z) - Enhancing CT Image synthesis from multi-modal MRI data based on a
multi-task neural network framework [16.864720020158906]
We propose a versatile multi-task neural network framework, based on an enhanced Transformer U-Net architecture.
We decompose the traditional problem of synthesizing CT images into distinct subtasks.
To enhance the framework's versatility in handling multi-modal data, we expand the model with multiple image channels.
arXiv Detail & Related papers (2023-12-13T18:22:38Z) - Masked Pre-Training of Transformers for Histology Image Analysis [4.710921988115685]
In digital pathology, whole slide images (WSIs) are widely used for applications such as cancer diagnosis and prognosis prediction.
Visual transformer models have emerged as a promising method for encoding large regions of WSIs while preserving spatial relationships among patches.
We propose a pretext task for training the transformer model without labeled data to address this problem.
Our model, MaskHIT, uses the transformer output to reconstruct masked patches and learn representative histological features based on their positions and visual features.
arXiv Detail & Related papers (2023-04-14T23:56:49Z) - PCRLv2: A Unified Visual Information Preservation Framework for
Self-supervised Pre-training in Medical Image Analysis [56.63327669853693]
We propose to incorporate the task of pixel restoration for explicitly encoding more pixel-level information into high-level semantics.
We also address the preservation of scale information, a powerful tool in aiding image understanding.
The proposed unified SSL framework surpasses its self-supervised counterparts on various tasks.
arXiv Detail & Related papers (2023-01-02T17:47:27Z) - Hierarchical Transformer for Survival Prediction Using Multimodality
Whole Slide Images and Genomics [63.76637479503006]
Learning good representation of giga-pixel level whole slide pathology images (WSI) for downstream tasks is critical.
This paper proposes a hierarchical-based multimodal transformer framework that learns a hierarchical mapping between pathology images and corresponding genes.
Our architecture requires fewer GPU resources compared with benchmark methods while maintaining better WSI representation ability.
arXiv Detail & Related papers (2022-11-29T23:47:56Z) - Spatiotemporal Feature Learning Based on Two-Step LSTM and Transformer
for CT Scans [2.3682456328966115]
We propose a novel, effective, two-step-wise approach to tickle this issue for COVID-19 symptom classification thoroughly.
First, the semantic feature embedding of each slice for a CT scan is extracted by conventional backbone networks.
Then, we proposed a long short-term memory (LSTM) and Transformer-based sub-network to deal with temporal feature learning.
arXiv Detail & Related papers (2022-07-04T16:59:05Z) - Transformer-empowered Multi-scale Contextual Matching and Aggregation
for Multi-contrast MRI Super-resolution [55.52779466954026]
Multi-contrast super-resolution (SR) reconstruction is promising to yield SR images with higher quality.
Existing methods lack effective mechanisms to match and fuse these features for better reconstruction.
We propose a novel network to address these problems by developing a set of innovative Transformer-empowered multi-scale contextual matching and aggregation techniques.
arXiv Detail & Related papers (2022-03-26T01:42:59Z) - Multi-layer Clustering-based Residual Sparsifying Transform for Low-dose
CT Image Reconstruction [11.011268090482575]
We propose a network-structured sparsifying transform learning approach for X-ray computed tomography (CT) reconstruction.
We apply the MCST model to low-dose CT reconstruction by deploying the learned MCST model into the regularizer in penalized weighted least squares (PWLS) reconstruction.
Our simulation results demonstrate that PWLS-MCST achieves better image reconstruction quality than the conventional FBP method and PWLS with edge-preserving (EP) regularizer.
arXiv Detail & Related papers (2022-03-22T09:38:41Z) - AlignTransformer: Hierarchical Alignment of Visual Regions and Disease
Tags for Medical Report Generation [50.21065317817769]
We propose an AlignTransformer framework, which includes the Align Hierarchical Attention (AHA) and the Multi-Grained Transformer (MGT) modules.
Experiments on the public IU-Xray and MIMIC-CXR datasets show that the AlignTransformer can achieve results competitive with state-of-the-art methods on the two datasets.
arXiv Detail & Related papers (2022-03-18T13:43:53Z) - Modality Completion via Gaussian Process Prior Variational Autoencoders
for Multi-Modal Glioma Segmentation [75.58395328700821]
We propose a novel model, Multi-modal Gaussian Process Prior Variational Autoencoder (MGP-VAE), to impute one or more missing sub-modalities for a patient scan.
MGP-VAE can leverage the Gaussian Process (GP) prior on the Variational Autoencoder (VAE) to utilize the subjects/patients and sub-modalities correlations.
We show the applicability of MGP-VAE on brain tumor segmentation where either, two, or three of four sub-modalities may be missing.
arXiv Detail & Related papers (2021-07-07T19:06:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.