Diffusion Generative Modeling for Spatially Resolved Gene Expression Inference from Histology Images
- URL: http://arxiv.org/abs/2501.15598v1
- Date: Sun, 26 Jan 2025 16:52:27 GMT
- Title: Diffusion Generative Modeling for Spatially Resolved Gene Expression Inference from Histology Images
- Authors: Sichen Zhu, Yuchen Zhu, Molei Tao, Peng Qiu,
- Abstract summary: We present $textbfStem$ ($textbfS$pa$textbfT$ially resolved gene $textbfE$xpression inference with diffusion.
$textbfStem$ generates high-fidelity gene expression predictions that share similar gene variation levels as ground truth data.
Our proposed pipeline opens up the possibility of analyzing existing, easily accessible H&E stained histology images from genomics point of view.
- Score: 11.64540208294516
- License:
- Abstract: Spatial Transcriptomics (ST) allows a high-resolution measurement of RNA sequence abundance by systematically connecting cell morphology depicted in Hematoxylin and Eosin (H&E) stained histology images to spatially resolved gene expressions. ST is a time-consuming, expensive yet powerful experimental technique that provides new opportunities to understand cancer mechanisms at a fine-grained molecular level, which is critical for uncovering new approaches for disease diagnosis and treatments. Here, we present $\textbf{Stem}$ ($\textbf{S}$pa$\textbf{T}$ially resolved gene $\textbf{E}$xpression inference with diffusion $\textbf{M}$odel), a novel computational tool that leverages a conditional diffusion generative model to enable in silico gene expression inference from H&E stained images. Through better capturing the inherent stochasticity and heterogeneity in ST data, $\textbf{Stem}$ achieves state-of-the-art performance on spatial gene expression prediction and generates biologically meaningful gene profiles for new H&E stained images at test time. We evaluate the proposed algorithm on datasets with various tissue sources and sequencing platforms, where it demonstrates clear improvement over existing approaches. $\textbf{Stem}$ generates high-fidelity gene expression predictions that share similar gene variation levels as ground truth data, suggesting that our method preserves the underlying biological heterogeneity. Our proposed pipeline opens up the possibility of analyzing existing, easily accessible H&E stained histology images from a genomics point of view without physically performing gene expression profiling and empowers potential biological discovery from H&E stained histology images.
Related papers
- GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters.
The model adheres to the central dogma of molecular biology, accurately generating protein-coding sequences.
It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of promoter sequences.
arXiv Detail & Related papers (2025-02-11T05:39:49Z) - MERGE: Multi-faceted Hierarchical Graph-based GNN for Gene Expression Prediction from Whole Slide Histopathology Images [6.717786190771243]
We introduce MERGE (Multi-faceted hiErarchical gRaph for Gene Expressions), which combines a hierarchical graph construction strategy with graph neural networks (GNN) to improve gene expression predictions from whole slide images.
By clustering tissue image patches based on both spatial and morphological features, our approach fosters interactions between distant tissue locations during GNN learning.
As an additional contribution, we evaluate different data smoothing techniques that are necessary to mitigate artifacts in ST data, often caused by technical imperfections.
arXiv Detail & Related papers (2024-12-03T17:32:05Z) - Multimodal contrastive learning for spatial gene expression prediction using histology images [13.47034080678041]
We propose textbfmclSTExp, a multimodal contrastive learning with Transformer and Densenet-121 encoder for Spatial Transcriptomics Expression prediction.
textbfmclSTExp has superior performance in predicting spatial gene expression.
It has shown promise in interpreting cancer-specific overexpressed genes, elucidating immune-related genes, and identifying specialized spatial domains annotated by pathologists.
arXiv Detail & Related papers (2024-07-11T06:33:38Z) - Spatially Resolved Gene Expression Prediction from Histology via Multi-view Graph Contrastive Learning with HSIC-bottleneck Regularization [18.554968935341236]
We propose a Multi-view Graph Contrastive Learning framework with HSIC-bottleneck Regularization(ST-GCHB) to help impute the gene expression of the queried imagingspots by considering their spatial dependency.
arXiv Detail & Related papers (2024-06-18T03:07:25Z) - Gene-Level Representation Learning via Interventional Style Transfer in Optical Pooled Screening [3.7038542578642715]
We employ a style-transfer approach to learn gene-level feature representations from images of genetically perturbed cells obtained via Optical pooled screening (OPS)
Our method outperforms widely used engineered features in clustering gene representations according to gene function, demonstrating its utility for uncovering latent biological relationships.
arXiv Detail & Related papers (2024-06-11T22:56:50Z) - Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification [119.13058298388101]
We develop a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances.
BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules.
BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules.
arXiv Detail & Related papers (2024-06-05T06:42:27Z) - VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling [60.91599380893732]
VQDNA is a general-purpose framework that renovates genome tokenization from the perspective of genome vocabulary learning.
By leveraging vector-quantized codebooks as learnable vocabulary, VQDNA can adaptively tokenize genomes into pattern-aware embeddings.
arXiv Detail & Related papers (2024-05-13T20:15:03Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Genetic InfoMax: Exploring Mutual Information Maximization in
High-Dimensional Imaging Genetics Studies [50.11449968854487]
Genome-wide association studies (GWAS) are used to identify relationships between genetic variations and specific traits.
Representation learning for imaging genetics is largely under-explored due to the unique challenges posed by GWAS.
We introduce a trans-modal learning framework Genetic InfoMax (GIM) to address the specific challenges of GWAS.
arXiv Detail & Related papers (2023-09-26T03:59:21Z) - Spatially Resolved Gene Expression Prediction from H&E Histology Images
via Bi-modal Contrastive Learning [4.067498002241427]
We present BLEEP (Bi-modaL Embedding for Expression Prediction), a bi-modal embedding framework capable of generating spatially resolved gene expression profiles.
BLEEP uses contrastive learning to construct a low-dimensional joint embedding space from a reference dataset using paired image and expression profiles at micrometer resolution.
We demonstrate BLEEP's effectiveness in gene expression prediction by benchmarking its performance on a human liver tissue dataset captured using the 10x Visium platform.
arXiv Detail & Related papers (2023-06-02T18:27:26Z) - Unsupervised ensemble-based phenotyping helps enhance the
discoverability of genes related to heart morphology [57.25098075813054]
We propose a new framework for gene discovery entitled Un Phenotype Ensembles.
It builds a redundant yet highly expressive representation by pooling a set of phenotypes learned in an unsupervised manner.
These phenotypes are then analyzed via (GWAS), retaining only highly confident and stable associations.
arXiv Detail & Related papers (2023-01-07T18:36:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.