Related papers: SPATIA: Multimodal Model for Prediction and Generation of Spatial Cell Phenotypes

SPATIA: Multimodal Model for Prediction and Generation of Spatial Cell Phenotypes

URL: http://arxiv.org/abs/2507.04704v1
Date: Mon, 07 Jul 2025 06:54:02 GMT
Title: SPATIA: Multimodal Model for Prediction and Generation of Spatial Cell Phenotypes
Authors: Zhenglun Kong, Mufan Qiu, John Boesen, Xiang Lin, Sukwon Yun, Tianlong Chen, Manolis Kellis, Marinka Zitnik,
Abstract summary: We introduce SPATIA, a multi-scale generative and predictive model for spatial transcriptomics.<n> SPATIA learns cell-level embeddings by fusing image-derived morphological tokens and transcriptomic vector tokens.<n>We benchmark SPATIA against 13 existing models across 12 individual tasks.
Score: 39.45743286683448
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding how cellular morphology, gene expression, and spatial organization jointly shape tissue function is a central challenge in biology. Image-based spatial transcriptomics technologies now provide high-resolution measurements of cell images and gene expression profiles, but machine learning methods typically analyze these modalities in isolation or at limited resolution. We address the problem of learning unified, spatially aware representations that integrate cell morphology, gene expression, and spatial context across biological scales. This requires models that can operate at single-cell resolution, reason across spatial neighborhoods, and generalize to whole-slide tissue organization. Here, we introduce SPATIA, a multi-scale generative and predictive model for spatial transcriptomics. SPATIA learns cell-level embeddings by fusing image-derived morphological tokens and transcriptomic vector tokens using cross-attention and then aggregates them at niche and tissue levels using transformer modules to capture spatial dependencies. SPATIA incorporates token merging in its generative diffusion decoder to synthesize high-resolution cell images conditioned on gene expression. We assembled a multi-scale dataset consisting of 17 million cell-gene pairs, 1 million niche-gene pairs, and 10,000 tissue-gene pairs across 49 donors, 17 tissue types, and 12 disease states. We benchmark SPATIA against 13 existing models across 12 individual tasks, which span several categories including cell annotation, cell clustering, gene imputation, cross-modal prediction, and image generation. SPATIA achieves improved performance over all baselines and generates realistic cell morphologies that reflect transcriptomic perturbations.

Related papers

SemanticST: Spatially Informed Semantic Graph Learning for Clustering, Integration, and Scalable Analysis of Spatial Transcriptomics [3.1403380447856426]
We present SemanticST, a graph-based deep learning framework for spatial transcriptomics analysis.<n>It supports mini-batch training, making it the first graph neural network scalable to large-scale datasets such as Xenium (500,000 cells)
arXiv Detail & Related papers (2025-06-13T06:30:48Z)
HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data [13.66950862644406]
We introduce HEIST, a hierarchical graph transformer-based model for spatial transcriptomics data.<n>HEIST is pre-trained on 22.3M cells from 124 tissues across 15 organs.<n>It effectively encodes the microenvironmental influences in cell embeddings, enabling the discovery of spatially-informed subpopulations.
arXiv Detail & Related papers (2025-06-11T12:29:01Z)
HistoSmith: Single-Stage Histology Image-Label Generation via Conditional Latent Diffusion for Enhanced Cell Segmentation and Classification [0.19791587637442667]
This study introduces a novel single-stage approach for generating image-label pairs to augment histology datasets.<n>Unlike state-of-the-art methods that utilize diffusion models with separate components for label and image generation, our approach employs a latent diffusion model.<n>This model enables tailored data generation by conditioning on user-defined parameters such as cell types, quantities, and tissue types.
arXiv Detail & Related papers (2025-02-12T19:51:41Z)
scReader: Prompting Large Language Models to Interpret scRNA-seq Data [12.767105992391555]
We propose an innovative hybrid approach that integrates the general knowledge capabilities of large language models with domain-specific representation models for single-cell omics data interpretation.<n>By inputting single-cell gene-level expression data with prompts, we effectively model cellular representations based on the differential expression levels of genes across various species and cell types.
arXiv Detail & Related papers (2024-12-24T04:28:42Z)
Multi-modal Spatial Clustering for Spatial Transcriptomics Utilizing High-resolution Histology Images [1.3124513975412255]
spatial transcriptomics (ST) enables transcriptome-wide gene expression profiling while preserving spatial context. Current spatial clustering methods fail to fully integrate high-resolution histology image features with gene expression data. We propose a novel contrastive learning-based deep learning approach that integrates gene expression data with histology image features.
arXiv Detail & Related papers (2024-10-31T00:32:24Z)
Multi-Modal and Multi-Attribute Generation of Single Cells with CFGen [76.02070962797794]
This work introduces CellFlow for Generation (CFGen), a flow-based conditional generative model that preserves the inherent discreteness of single-cell data.<n>CFGen generates whole-genome multi-modal single-cell data reliably, improving the recovery of crucial biological data characteristics.
arXiv Detail & Related papers (2024-07-16T14:05:03Z)
Revisiting Adaptive Cellular Recognition Under Domain Shifts: A Contextual Correspondence View [49.03501451546763]
We identify the importance of implicit correspondences across biological contexts for exploiting domain-invariant pathological composition. We propose self-adaptive dynamic distillation to secure instance-aware trade-offs across different model constituents.
arXiv Detail & Related papers (2024-07-14T04:41:16Z)
Single-Cell Deep Clustering Method Assisted by Exogenous Gene Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells. During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation. This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z)
AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images [53.29794593104923]
We present a novel concept of shared-context processing for whole slide histopathology images. AMIGO uses the celluar graph within the tissue to provide a single representation for a patient. We show that our model is strongly robust to missing information to an extent that it can achieve the same performance with as low as 20% of the data.
arXiv Detail & Related papers (2023-03-01T23:37:45Z)
Self-Supervised Graph Representation Learning for Neuronal Morphologies [75.38832711445421]
We present GraphDINO, a data-driven approach to learn low-dimensional representations of 3D neuronal morphologies from unlabeled datasets. We show, in two different species and across multiple brain areas, that this method yields morphological cell type clusterings on par with manual feature-based classification by experts. Our method could potentially enable data-driven discovery of novel morphological features and cell types in large-scale datasets.
arXiv Detail & Related papers (2021-12-23T12:17:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.