HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data
- URL: http://arxiv.org/abs/2506.11152v1
- Date: Wed, 11 Jun 2025 12:29:01 GMT
- Title: HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data
- Authors: Hiren Madhu, João Felipe Rocha, Tinglin Huang, Siddharth Viswanath, Smita Krishnaswamy, Rex Ying,
- Abstract summary: We introduce HEIST, a hierarchical graph transformer-based model for spatial transcriptomics data.<n>HEIST is pre-trained on 22.3M cells from 124 tissues across 15 organs.<n>It effectively encodes the microenvironmental influences in cell embeddings, enabling the discovery of spatially-informed subpopulations.
- Score: 13.66950862644406
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Single-cell transcriptomics has become a great source for data-driven insights into biology, enabling the use of advanced deep learning methods to understand cellular heterogeneity and transcriptional regulation at the single-cell level. With the advent of spatial transcriptomics data we have the promise of learning about cells within a tissue context as it provides both spatial coordinates and transcriptomic readouts. However, existing models either ignore spatial resolution or the gene regulatory information. Gene regulation in cells can change depending on microenvironmental cues from neighboring cells, but existing models neglect gene regulatory patterns with hierarchical dependencies across levels of abstraction. In order to create contextualized representations of cells and genes from spatial transcriptomics data, we introduce HEIST, a hierarchical graph transformer-based foundation model for spatial transcriptomics and proteomics data. HEIST models tissue as spatial cellular neighborhood graphs, and each cell is, in turn, modeled as a gene regulatory network graph. The framework includes a hierarchical graph transformer that performs cross-level message passing and message passing within levels. HEIST is pre-trained on 22.3M cells from 124 tissues across 15 organs using spatially-aware contrastive learning and masked auto-encoding objectives. Unsupervised analysis of HEIST representations of cells, shows that it effectively encodes the microenvironmental influences in cell embeddings, enabling the discovery of spatially-informed subpopulations that prior models fail to differentiate. Further, HEIST achieves state-of-the-art results on four downstream task such as clinical outcome prediction, cell type annotation, gene imputation, and spatially-informed cell clustering across multiple technologies, highlighting the importance of hierarchical modeling and GRN-based representations.
Related papers
- SPATIA: Multimodal Model for Prediction and Generation of Spatial Cell Phenotypes [39.45743286683448]
We introduce SPATIA, a multi-scale generative and predictive model for spatial transcriptomics.<n> SPATIA learns cell-level embeddings by fusing image-derived morphological tokens and transcriptomic vector tokens.<n>We benchmark SPATIA against 13 existing models across 12 individual tasks.
arXiv Detail & Related papers (2025-07-07T06:54:02Z) - OmniCellTOSG: The First Cell Text-Omic Signaling Graphs Dataset for Joint LLM and GNN Modeling [14.455616582960557]
We introduce OmniCellTOSG, the first dataset of cell text-omic signaling graphs (TOSGs)<n>Each TOSG represents the signaling network of an individual or meta-cell and is labeled with information such as organ, disease, sex, age, and cell subtype.<n>The dataset is continuously expanding and will be updated regularly.
arXiv Detail & Related papers (2025-04-02T21:47:58Z) - A scalable gene network model of regulatory dynamics in single cells [88.48246132084441]
We introduce a Functional Learnable model of Cell dynamicS, FLeCS, that incorporates gene network structure into coupled differential equations to model gene regulatory functions.<n>Given (pseudo)time-series single-cell data, FLeCS accurately infers cell dynamics at scale.
arXiv Detail & Related papers (2025-03-25T19:19:21Z) - HistoSmith: Single-Stage Histology Image-Label Generation via Conditional Latent Diffusion for Enhanced Cell Segmentation and Classification [0.19791587637442667]
This study introduces a novel single-stage approach for generating image-label pairs to augment histology datasets.<n>Unlike state-of-the-art methods that utilize diffusion models with separate components for label and image generation, our approach employs a latent diffusion model.<n>This model enables tailored data generation by conditioning on user-defined parameters such as cell types, quantities, and tissue types.
arXiv Detail & Related papers (2025-02-12T19:51:41Z) - Cell-ontology guided transcriptome foundation model [18.51941953027685]
We pre-trained scCello on 22 million cells from CellxGene database leveraging their cell-type labels mapped to the cell ontology graph from Open Biological and Biomedical Ontology Foundry.<n>Our TFM demonstrates competitive generalization and transferability performance over the existing TFMs on biologically important tasks.
arXiv Detail & Related papers (2024-08-22T13:15:49Z) - Multi-Modal and Multi-Attribute Generation of Single Cells with CFGen [76.02070962797794]
This work introduces CellFlow for Generation (CFGen), a flow-based conditional generative model that preserves the inherent discreteness of single-cell data.<n>CFGen generates whole-genome multi-modal single-cell data reliably, improving the recovery of crucial biological data characteristics.
arXiv Detail & Related papers (2024-07-16T14:05:03Z) - scBiGNN: Bilevel Graph Representation Learning for Cell Type
Classification from Single-cell RNA Sequencing Data [62.87454293046843]
Graph neural networks (GNNs) have been widely used for automatic cell type classification.
scBiGNN comprises two GNN modules to identify cell types.
scBiGNN outperforms a variety of existing methods for cell type classification from scRNA-seq data.
arXiv Detail & Related papers (2023-12-16T03:54:26Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Tertiary Lymphoid Structures Generation through Graph-based Diffusion [54.37503714313661]
In this work, we leverage state-of-the-art graph-based diffusion models to generate biologically meaningful cell-graphs.
We show that the adopted graph diffusion model is able to accurately learn the distribution of cells in terms of their tertiary lymphoid structures (TLS) content.
arXiv Detail & Related papers (2023-10-10T14:37:17Z) - Topology-Guided Multi-Class Cell Context Generation for Digital
Pathology [28.43244574309888]
We introduce several mathematical tools from spatial statistics and topological data analysis.
We generate high quality multi-class cell layouts for the first time.
We show that the topology-rich cell layouts can be used for data augmentation and improve the performance of downstream tasks such as cell classification.
arXiv Detail & Related papers (2023-04-05T07:01:34Z) - Self-Supervised Graph Representation Learning for Neuronal Morphologies [75.38832711445421]
We present GraphDINO, a data-driven approach to learn low-dimensional representations of 3D neuronal morphologies from unlabeled datasets.
We show, in two different species and across multiple brain areas, that this method yields morphological cell type clusterings on par with manual feature-based classification by experts.
Our method could potentially enable data-driven discovery of novel morphological features and cell types in large-scale datasets.
arXiv Detail & Related papers (2021-12-23T12:17:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.