MultiST: A Cross-Attention-Based Multimodal Model for Spatial Transcriptomic
- URL: http://arxiv.org/abs/2601.13331v1
- Date: Mon, 19 Jan 2026 19:11:03 GMT
- Title: MultiST: A Cross-Attention-Based Multimodal Model for Spatial Transcriptomic
- Authors: Wei Wang, Quoc-Toan Ly, Chong Yu, Jun Bai,
- Abstract summary: We propose MultiST, a unified framework that models spatial topology, gene expression, and tissue morphology through cross-attention-based fusion.<n>We evaluated the proposed method on 13 diverse ST datasets spanning two organs, including human brain cortex and breast cancer tissue.
- Score: 21.236918431473466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spatial transcriptomics (ST) enables transcriptome-wide profiling while preserving the spatial context of tissues, offering unprecedented opportunities to study tissue organization and cell-cell interactions in situ. Despite recent advances, existing methods often lack effective integration of histological morphology with molecular profiles, relying on shallow fusion strategies or omitting tissue images altogether, which limits their ability to resolve ambiguous spatial domain boundaries. To address this challenge, we propose MultiST, a unified multimodal framework that jointly models spatial topology, gene expression, and tissue morphology through cross-attention-based fusion. MultiST employs graph-based gene encoders with adversarial alignment to learn robust spatial representations, while integrating color-normalized histological features to capture molecular-morphological dependencies and refine domain boundaries. We evaluated the proposed method on 13 diverse ST datasets spanning two organs, including human brain cortex and breast cancer tissue. MultiST yields spatial domains with clearer and more coherent boundaries than existing methods, leading to more stable pseudotime trajectories and more biologically interpretable cell-cell interaction patterns. The MultiST framework and source code are available at https://github.com/LabJunBMI/MultiST.git.
Related papers
- Fusing Pixels and Genes: Spatially-Aware Learning in Computational Pathology [46.83014413674925]
STAMP is a spatial transcriptomics-augmented multimodal pathology representation learning framework.<n>Our study shows that self-supervised, gene-guided training provides a robust and task-agnostic signal for learning pathology image representations.<n>We validate STAMP across six datasets and four downstream tasks, where it consistently achieves strong performance.
arXiv Detail & Related papers (2026-02-15T00:59:13Z) - Uncovering spatial tissue domains and cell types in spatial omics through cross-scale profiling of cellular and genomic interactions [26.7111709393529]
We present CellScape, a deep learning framework designed to overcome limitations for high-performance spatial transcriptomics analysis.<n>CellScape models cellular interactions in tissue space and genomic relationships among cells, producing comprehensive representations.<n>This technique uncovers biologically informative patterns that improve spatial domain segmentation.
arXiv Detail & Related papers (2026-02-13T06:22:43Z) - Multi-label Classification with Panoptic Context Aggregation Networks [61.82285737410154]
This paper introduces the Deep Panoptic Context Aggregation Network (PanCAN), a novel approach that hierarchically integrates multi-order geometric contexts.<n>PanCAN learns multi-order neighborhood relationships at each scale by combining random walks with an attention mechanism.<n>Experiments on NUS-WIDE, PASCAL VOC,2007, and MS-COCO benchmarks demonstrate that PanCAN consistently achieves competitive results.
arXiv Detail & Related papers (2025-12-29T14:16:21Z) - A Multi-scale Fused Graph Neural Network with Inter-view Contrastive Learning for Spatial Transcriptomics Data Clustering [7.214595408714774]
stMFG is proposed, a multi-scale interactive fusion graph network that introduces layer-wise cross-view attention to dynamically integrate spatial and gene features after each convolution.<n>It outperforms state-of-the-art methods, achieving up to 14% ARI improvement on certain slices.
arXiv Detail & Related papers (2025-12-18T05:13:55Z) - A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis [82.01597026329158]
We introduce a Correlation-Regulated Alignment Framework for Tissue Synthesis (CRAFTS) for pathology-specific text-to-image synthesis.<n>CRAFTS incorporates a novel alignment mechanism that suppresses semantic drift to ensure biological accuracy.<n>This model generates diverse pathological images spanning 30 cancer types, with quality rigorously validated by objective metrics and pathologist evaluations.
arXiv Detail & Related papers (2025-12-15T10:22:43Z) - HiFusion: Hierarchical Intra-Spot Alignment and Regional Context Fusion for Spatial Gene Expression Prediction from Histopathology [7.982889842329205]
HiFusion is a novel deep learning framework that integrates two complementary components.<n>We show that HiFusion achieves state-of-the-art performance across both 2D slide-wise cross-validation and more challenging 3D sample-specific scenarios.<n>These results underscore HiFusion's potential as a robust, accurate, and scalable solution for ST inference from routine histopathology.
arXiv Detail & Related papers (2025-11-17T04:47:39Z) - scMRDR: A scalable and flexible framework for unpaired single-cell multi-omics data integration [53.683726781791385]
We introduce a scalable and flexible generative framework called single-cell Multi-omics Regularized Disentangled Representations (scMRDR) for unpaired multi-omics integration.<n>Our method achieves excellent performance on benchmark datasets in terms of batch correction, modality alignment, and biological signal preservation.
arXiv Detail & Related papers (2025-10-28T21:28:39Z) - SemanticST: Spatially Informed Semantic Graph Learning for Clustering, Integration, and Scalable Analysis of Spatial Transcriptomics [3.1403380447856426]
We present SemanticST, a graph-based deep learning framework for spatial transcriptomics analysis.<n>It supports mini-batch training, making it the first graph neural network scalable to large-scale datasets such as Xenium (500,000 cells)
arXiv Detail & Related papers (2025-06-13T06:30:48Z) - MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention [57.044719143401664]
Histopathology and transcriptomics are fundamental modalities in oncology, encapsulating the morphological and molecular aspects of the disease.<n>We present MIRROR, a novel multi-modal representation learning method designed to foster both modality alignment and retention.<n>Extensive evaluations on TCGA cohorts for cancer subtyping and survival analysis highlight MIRROR's superior performance.
arXiv Detail & Related papers (2025-03-01T07:02:30Z) - Multi-modal Spatial Clustering for Spatial Transcriptomics Utilizing High-resolution Histology Images [1.3124513975412255]
spatial transcriptomics (ST) enables transcriptome-wide gene expression profiling while preserving spatial context.
Current spatial clustering methods fail to fully integrate high-resolution histology image features with gene expression data.
We propose a novel contrastive learning-based deep learning approach that integrates gene expression data with histology image features.
arXiv Detail & Related papers (2024-10-31T00:32:24Z) - Revisiting Adaptive Cellular Recognition Under Domain Shifts: A Contextual Correspondence View [49.03501451546763]
We identify the importance of implicit correspondences across biological contexts for exploiting domain-invariant pathological composition.
We propose self-adaptive dynamic distillation to secure instance-aware trade-offs across different model constituents.
arXiv Detail & Related papers (2024-07-14T04:41:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.