SToFM: a Multi-scale Foundation Model for Spatial Transcriptomics
- URL: http://arxiv.org/abs/2507.11588v2
- Date: Wed, 23 Jul 2025 15:22:26 GMT
- Title: SToFM: a Multi-scale Foundation Model for Spatial Transcriptomics
- Authors: Suyuan Zhao, Yizhen Luo, Ganbo Yang, Yan Zhong, Hao Zhou, Zaiqing Nie,
- Abstract summary: Building foundational models for Spatial Transcriptomics can significantly enhance the analysis of vast and complex data sources.<n>We propose SToFM, a multi-scale Spatial Transcriptomics Foundation Model.<n>SToFM performs multi-scale information extraction on each ST slice, to construct a set of ST sub-slices that aggregate macro-, micro- and gene-scale information.<n>SToFM achieves outstanding performance on a variety of downstream tasks, such as tissue region semantic segmentation and cell type annotation.
- Score: 14.008862724608415
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spatial Transcriptomics (ST) technologies provide biologists with rich insights into single-cell biology by preserving spatial context of cells. Building foundational models for ST can significantly enhance the analysis of vast and complex data sources, unlocking new perspectives on the intricacies of biological tissues. However, modeling ST data is inherently challenging due to the need to extract multi-scale information from tissue slices containing vast numbers of cells. This process requires integrating macro-scale tissue morphology, micro-scale cellular microenvironment, and gene-scale gene expression profile. To address this challenge, we propose SToFM, a multi-scale Spatial Transcriptomics Foundation Model. SToFM first performs multi-scale information extraction on each ST slice, to construct a set of ST sub-slices that aggregate macro-, micro- and gene-scale information. Then an SE(2) Transformer is used to obtain high-quality cell representations from the sub-slices. Additionally, we construct \textbf{SToCorpus-88M}, the largest high-resolution spatial transcriptomics corpus for pretraining. SToFM achieves outstanding performance on a variety of downstream tasks, such as tissue region semantic segmentation and cell type annotation, demonstrating its comprehensive understanding of ST data through capturing and integrating multi-scale information.
Related papers
- SemanticST: Spatially Informed Semantic Graph Learning for Clustering, Integration, and Scalable Analysis of Spatial Transcriptomics [3.1403380447856426]
We present SemanticST, a graph-based deep learning framework for spatial transcriptomics analysis.<n>It supports mini-batch training, making it the first graph neural network scalable to large-scale datasets such as Xenium (500,000 cells)
arXiv Detail & Related papers (2025-06-13T06:30:48Z) - CellVerse: Do Large Language Models Really Understand Cell Biology? [74.34984441715517]
We introduce CellVerse, a unified language-centric question-answering benchmark that integrates four types of single-cell multi-omics data.<n>We systematically evaluate the performance across 14 open-source and closed-source LLMs ranging from 160M to 671B on CellVerse.
arXiv Detail & Related papers (2025-05-09T06:47:23Z) - Biology Instructions: A Dataset and Benchmark for Multi-Omics Sequence Understanding Capability of Large Language Models [51.316001071698224]
We introduce Biology-Instructions, the first large-scale multi-omics biological sequences-related instruction-tuning dataset.<n>This dataset can bridge the gap between large language models (LLMs) and complex biological sequences-related tasks.<n>We also develop a strong baseline called ChatMultiOmics with a novel three-stage training pipeline.
arXiv Detail & Related papers (2024-12-26T12:12:23Z) - Multi-modal Spatial Clustering for Spatial Transcriptomics Utilizing High-resolution Histology Images [1.3124513975412255]
spatial transcriptomics (ST) enables transcriptome-wide gene expression profiling while preserving spatial context.
Current spatial clustering methods fail to fully integrate high-resolution histology image features with gene expression data.
We propose a novel contrastive learning-based deep learning approach that integrates gene expression data with histology image features.
arXiv Detail & Related papers (2024-10-31T00:32:24Z) - scFusionTTT: Single-cell transcriptomics and proteomics fusion with Test-Time Training layers [14.254553622632594]
scFusion is a novel method for Single-Cell multimodal omics Fusion with TTT-based masked autoencoder.
We combine the order information of genes and proteins in the human genome with the TTT layer, fuse multimodal omics, and enhance unimodal omics analysis.
arXiv Detail & Related papers (2024-10-17T06:29:29Z) - MorphoSeg: An Uncertainty-Aware Deep Learning Method for Biomedical Segmentation of Complex Cellular Morphologies [5.50767638479269]
Deep learning has revolutionized medical and biological imaging, particularly in segmentation tasks.<n> segmenting biological cells remains challenging due to the high variability and complexity of cell shapes.<n>We introduce a novel benchmark dataset of Ntera-2 cells, a pluripotent carcinoma cell line, exhibiting diverse morphologies across multiple stages of differentiation.<n>We propose an uncertainty-aware deep learning framework for complex cellular morphology segmentation (MorphoSeg) by incorporating sampling of virtual outliers from low-likelihood regions during training.
arXiv Detail & Related papers (2024-09-25T17:25:06Z) - Multi-Modal and Multi-Attribute Generation of Single Cells with CFGen [76.02070962797794]
This work introduces CellFlow for Generation (CFGen), a flow-based conditional generative model that preserves the inherent discreteness of single-cell data.<n>CFGen generates whole-genome multi-modal single-cell data reliably, improving the recovery of crucial biological data characteristics.
arXiv Detail & Related papers (2024-07-16T14:05:03Z) - Must: Maximizing Latent Capacity of Spatial Transcriptomics Data [41.70354088000952]
This paper introduces Multiple-modality Structure Transformation, named MuST, a novel methodology to tackle the challenge.
It integrates the multi-modality information contained in the ST data effectively into a uniform latent space to provide a foundation for all the downstream tasks.
The results show that it outperforms existing state-of-the-art methods with clear advantages in the precision of identifying and preserving structures of tissues and biomarkers.
arXiv Detail & Related papers (2024-01-15T09:07:28Z) - Tertiary Lymphoid Structures Generation through Graph-based Diffusion [54.37503714313661]
In this work, we leverage state-of-the-art graph-based diffusion models to generate biologically meaningful cell-graphs.
We show that the adopted graph diffusion model is able to accurately learn the distribution of cells in terms of their tertiary lymphoid structures (TLS) content.
arXiv Detail & Related papers (2023-10-10T14:37:17Z) - AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context
Processing for Representation Learning of Giga-pixel Images [53.29794593104923]
We present a novel concept of shared-context processing for whole slide histopathology images.
AMIGO uses the celluar graph within the tissue to provide a single representation for a patient.
We show that our model is strongly robust to missing information to an extent that it can achieve the same performance with as low as 20% of the data.
arXiv Detail & Related papers (2023-03-01T23:37:45Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.