DELST: Dual Entailment Learning for Hyperbolic Image-Gene Pretraining in Spatial Transcriptomics
- URL: http://arxiv.org/abs/2503.00804v1
- Date: Sun, 02 Mar 2025 09:00:09 GMT
- Title: DELST: Dual Entailment Learning for Hyperbolic Image-Gene Pretraining in Spatial Transcriptomics
- Authors: Xulin Chen, Junzhou Huang,
- Abstract summary: We propose DELST, the first framework to embed hyperbolic representations while modeling hierarchy for image-gene pretraining.<n>Our framework achieves improved predictive performance compared to existing methods.
- Score: 38.94542898899791
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spatial transcriptomics (ST) maps gene expression within tissue at individual spots, making it a valuable resource for multimodal representation learning. Additionally, ST inherently contains rich hierarchical information both across and within modalities. For instance, different spots exhibit varying numbers of nonzero gene expressions, corresponding to different levels of cellular activity and semantic hierarchies. However, existing methods rely on contrastive alignment of image-gene pairs, failing to accurately capture the intricate hierarchical relationships in ST data. Here, we propose DELST, the first framework to embed hyperbolic representations while modeling hierarchy for image-gene pretraining at two levels: (1) Cross-modal entailment learning, which establishes an order relationship between genes and images to enhance image representation generalization; (2) Intra-modal entailment learning, which encodes gene expression patterns as hierarchical relationships, guiding hierarchical learning across different samples at a global scale and integrating biological insights into single-modal representations. Extensive experiments on ST benchmarks annotated by pathologists demonstrate the effectiveness of our framework, achieving improved predictive performance compared to existing methods. Our code and models are available at: https://github.com/XulinChen/DELST.
Related papers
- ST-Align: A Multimodal Foundation Model for Image-Gene Alignment in Spatial Transcriptomics [6.680197957317297]
Spatial transcriptomics (ST) provides high-resolution pathological images and whole-transcriptomic expression profiles at individual spots across whole-slide scales.
ST-Align is the first foundation model designed for ST that deeply aligns image-gene pairs by incorporating spatial context.
ST-Align employs specialized encoders tailored to distinct ST contexts, followed by an Attention-Based Fusion Network (ABFN) for enhanced multimodal fusion.
arXiv Detail & Related papers (2024-11-25T09:15:07Z) - M2ORT: Many-To-One Regression Transformer for Spatial Transcriptomics
Prediction from Histopathology Images [17.158450092707042]
M2ORT is a many-to-one regression Transformer that can accommodate the hierarchical structure of pathology images.
We have tested M2ORT on three public ST datasets and the experimental results show that M2ORT can achieve state-of-the-art performance.
arXiv Detail & Related papers (2024-01-19T10:37:27Z) - Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL)
Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images.
Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z) - Gene-induced Multimodal Pre-training for Image-omic Classification [20.465959546613554]
This paper proposes a Gene-induced Multimodal Pre-training framework, which jointly incorporates genomics and Whole Slide Images (WSIs) for classification tasks.
Experimental results on the TCGA dataset show the superiority of our network architectures and our pre-training framework, achieving 99.47% in accuracy for image-omic classification.
arXiv Detail & Related papers (2023-09-06T04:30:15Z) - Bi-Modality Medical Image Synthesis Using Semi-Supervised Sequential
Generative Adversarial Networks [35.358653509217994]
We propose a bi-modality medical image synthesis approach based on sequential generative adversarial network (GAN) and semi-supervised learning.
Our approach consists of two generative modules that synthesize images of the two modalities in a sequential order.
Visual and quantitative results demonstrate the superiority of our method to the state-of-the-art methods.
arXiv Detail & Related papers (2023-08-27T10:39:33Z) - The Whole Pathological Slide Classification via Weakly Supervised
Learning [7.313528558452559]
We introduce two pathological priors: nuclear disease of cells and spatial correlation of pathological tiles.
We propose a data augmentation method that utilizes stain separation during extractor training.
We then describe the spatial relationships between the tiles using an adjacency matrix.
By integrating these two views, we designed a multi-instance framework for analyzing H&E-stained tissue images.
arXiv Detail & Related papers (2023-07-12T16:14:23Z) - Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images [60.34381768479834]
Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language.
We pioneer a systematic study on deepfake detection generated by state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-04-02T10:25:09Z) - Learning to Exploit Temporal Structure for Biomedical Vision-Language
Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities.
We explicitly account for prior images and reports when available during both training and fine-tuning.
Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z) - Symbiotic Adversarial Learning for Attribute-based Person Search [86.7506832053208]
We present a symbiotic adversarial learning framework, called SAL.Two GANs sit at the base of the framework in a symbiotic learning scheme.
Specifically, two different types of generative adversarial networks learn collaboratively throughout the training process.
arXiv Detail & Related papers (2020-07-19T07:24:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.