Related papers: M2ORT: Many-To-One Regression Transformer for Spatial Transcriptomics Prediction from Histopathology Images

M2ORT: Many-To-One Regression Transformer for Spatial Transcriptomics Prediction from Histopathology Images

URL: http://arxiv.org/abs/2401.10608v2
Date: Wed, 24 Jan 2024 13:47:33 GMT
Title: M2ORT: Many-To-One Regression Transformer for Spatial Transcriptomics Prediction from Histopathology Images
Authors: Hongyi Wang, Xiuju Du, Jing Liu, Shuyi Ouyang, Yen-Wei Chen, Lanfen Lin
Abstract summary: M2ORT is a many-to-one regression Transformer that can accommodate the hierarchical structure of pathology images. We have tested M2ORT on three public ST datasets and the experimental results show that M2ORT can achieve state-of-the-art performance.
Score: 17.158450092707042
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The advancement of Spatial Transcriptomics (ST) has facilitated the spatially-aware profiling of gene expressions based on histopathology images. Although ST data offers valuable insights into the micro-environment of tumors, its acquisition cost remains expensive. Therefore, directly predicting the ST expressions from digital pathology images is desired. Current methods usually adopt existing regression backbones for this task, which ignore the inherent multi-scale hierarchical data structure of digital pathology images. To address this limit, we propose M2ORT, a many-to-one regression Transformer that can accommodate the hierarchical structure of the pathology images through a decoupled multi-scale feature extractor. Different from traditional models that are trained with one-to-one image-label pairs, M2ORT accepts multiple pathology images of different magnifications at a time to jointly predict the gene expressions at their corresponding common ST spot, aiming at learning a many-to-one relationship through training. We have tested M2ORT on three public ST datasets and the experimental results show that M2ORT can achieve state-of-the-art performance with fewer parameters and floating-point operations (FLOPs). The code is available at: https://github.com/Dootmaan/M2ORT/.

Related papers

Towards Spatial Transcriptomics-guided Pathological Image Recognition with Batch-Agnostic Encoder [5.024983453990064]
We propose a batch-agnostic contrastive learning framework that can extract consistent signals from gene expression of ST in multiple patients. Experiments demonstrated the effectiveness of our framework on a publicly available dataset.
arXiv Detail & Related papers (2025-03-10T10:50:33Z)
DELST: Dual Entailment Learning for Hyperbolic Image-Gene Pretraining in Spatial Transcriptomics [38.94542898899791]
We propose DELST, the first framework to embed hyperbolic representations while modeling hierarchy for image-gene pretraining. Our framework achieves improved predictive performance compared to existing methods.
arXiv Detail & Related papers (2025-03-02T09:00:09Z)
M2OST: Many-to-one Regression for Predicting Spatial Transcriptomics from Digital Pathology Images [16.19308597273405]
We propose M2OST, a many-to-one regression Transformer that can accommodate the hierarchical structure of pathology images. Unlike traditional models that are trained with one-to-one image-label pairs, M2OST uses multiple images from different levels of the digital pathology image to jointly predict the gene expressions in their common corresponding spot. M2OST can achieve state-of-the-art performance with fewer parameters and floating-point operations (FLOPs)
arXiv Detail & Related papers (2024-09-23T15:06:37Z)
Enhanced Self-supervised Learning for Multi-modality MRI Segmentation and Classification: A Novel Approach Avoiding Model Collapse [6.3467517115551875]
Multi-modality magnetic resonance imaging (MRI) can provide complementary information for computer-aided diagnosis. Traditional deep learning algorithms are suitable for identifying specific anatomical structures segmenting lesions and classifying diseases with magnetic resonance images. Self-supervised learning (SSL) can effectively learn feature representations from unlabeled data by pre-training and is demonstrated to be effective in natural image analysis. Most SSL methods ignore the similarity of multi-modality MRI, leading to model collapse. We establish and validate a multi-modality MRI masked autoencoder consisting of hybrid mask pattern (HMP) and pyramid barlow twin (PBT
arXiv Detail & Related papers (2024-07-15T01:11:30Z)
Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training [99.2891802841936]
We introduce the Med-ST framework for fine-grained spatial and temporal modeling. For spatial modeling, Med-ST employs the Mixture of View Expert (MoVE) architecture to integrate different visual features from both frontal and lateral views. For temporal modeling, we propose a novel cross-modal bidirectional cycle consistency objective by forward mapping classification (FMC) and reverse mapping regression (RMR)
arXiv Detail & Related papers (2024-05-30T03:15:09Z)
Enhancing CT Image synthesis from multi-modal MRI data based on a multi-task neural network framework [16.864720020158906]
We propose a versatile multi-task neural network framework, based on an enhanced Transformer U-Net architecture. We decompose the traditional problem of synthesizing CT images into distinct subtasks. To enhance the framework's versatility in handling multi-modal data, we expand the model with multiple image channels.
arXiv Detail & Related papers (2023-12-13T18:22:38Z)
Masked Pre-Training of Transformers for Histology Image Analysis [4.710921988115685]
In digital pathology, whole slide images (WSIs) are widely used for applications such as cancer diagnosis and prognosis prediction. Visual transformer models have emerged as a promising method for encoding large regions of WSIs while preserving spatial relationships among patches. We propose a pretext task for training the transformer model without labeled data to address this problem. Our model, MaskHIT, uses the transformer output to reconstruct masked patches and learn representative histological features based on their positions and visual features.
arXiv Detail & Related papers (2023-04-14T23:56:49Z)
PCRLv2: A Unified Visual Information Preservation Framework for Self-supervised Pre-training in Medical Image Analysis [56.63327669853693]
We propose to incorporate the task of pixel restoration for explicitly encoding more pixel-level information into high-level semantics. We also address the preservation of scale information, a powerful tool in aiding image understanding. The proposed unified SSL framework surpasses its self-supervised counterparts on various tasks.
arXiv Detail & Related papers (2023-01-02T17:47:27Z)
Hierarchical Transformer for Survival Prediction Using Multimodality Whole Slide Images and Genomics [63.76637479503006]
Learning good representation of giga-pixel level whole slide pathology images (WSI) for downstream tasks is critical. This paper proposes a hierarchical-based multimodal transformer framework that learns a hierarchical mapping between pathology images and corresponding genes. Our architecture requires fewer GPU resources compared with benchmark methods while maintaining better WSI representation ability.
arXiv Detail & Related papers (2022-11-29T23:47:56Z)
Transformer-empowered Multi-scale Contextual Matching and Aggregation for Multi-contrast MRI Super-resolution [55.52779466954026]
Multi-contrast super-resolution (SR) reconstruction is promising to yield SR images with higher quality. Existing methods lack effective mechanisms to match and fuse these features for better reconstruction. We propose a novel network to address these problems by developing a set of innovative Transformer-empowered multi-scale contextual matching and aggregation techniques.
arXiv Detail & Related papers (2022-03-26T01:42:59Z)
Multi-layer Clustering-based Residual Sparsifying Transform for Low-dose CT Image Reconstruction [11.011268090482575]
We propose a network-structured sparsifying transform learning approach for X-ray computed tomography (CT) reconstruction. We apply the MCST model to low-dose CT reconstruction by deploying the learned MCST model into the regularizer in penalized weighted least squares (PWLS) reconstruction. Our simulation results demonstrate that PWLS-MCST achieves better image reconstruction quality than the conventional FBP method and PWLS with edge-preserving (EP) regularizer.
arXiv Detail & Related papers (2022-03-22T09:38:41Z)
AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation [50.21065317817769]
We propose an AlignTransformer framework, which includes the Align Hierarchical Attention (AHA) and the Multi-Grained Transformer (MGT) modules. Experiments on the public IU-Xray and MIMIC-CXR datasets show that the AlignTransformer can achieve results competitive with state-of-the-art methods on the two datasets.
arXiv Detail & Related papers (2022-03-18T13:43:53Z)
Modality Completion via Gaussian Process Prior Variational Autoencoders for Multi-Modal Glioma Segmentation [75.58395328700821]
We propose a novel model, Multi-modal Gaussian Process Prior Variational Autoencoder (MGP-VAE), to impute one or more missing sub-modalities for a patient scan. MGP-VAE can leverage the Gaussian Process (GP) prior on the Variational Autoencoder (VAE) to utilize the subjects/patients and sub-modalities correlations. We show the applicability of MGP-VAE on brain tumor segmentation where either, two, or three of four sub-modalities may be missing.
arXiv Detail & Related papers (2021-07-07T19:06:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.