Teaching pathology foundation models to accurately predict gene expression with parameter efficient knowledge transfer
- URL: http://arxiv.org/abs/2504.07061v1
- Date: Wed, 09 Apr 2025 17:24:41 GMT
- Title: Teaching pathology foundation models to accurately predict gene expression with parameter efficient knowledge transfer
- Authors: Shi Pan, Jianan Chen, Maria Secrier,
- Abstract summary: Efficient Knowledge Adaptation (PEKA) is a novel framework that integrates knowledge distillation and structure alignment losses for cross-modal knowledge transfer.<n>We evaluated PEKA for gene expression prediction using multiple spatial transcriptomics datasets.
- Score: 1.5416321520529301
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gene expression profiling provides critical insights into cellular heterogeneity, biological processes and disease mechanisms. There has been an increasing interest in computational approaches that can predict gene expression directly from digitalized histopathology images. While image foundation models have shown promise in a variety of pathology downstream analysis, their performances on gene-expression prediction are still limited. Explicitly incorporating information from the transcriptomic models can help image models to address domain shift, yet the fine-tuning and alignment of foundation models can be expensive. In the work, we propose Parameter Efficient Knowledge trAnsfer (PEKA), a novel framework that leverages Block-Affine Adaptation and integrates knowledge distillation and structure alignment losses for cross-modal knowledge transfer. We evaluated PEKA for gene expression prediction using multiple spatial transcriptomics datasets (comprising 206,123 image tiles with matched gene expression profiles) that encompassed various types of tissue. PEKA achieved at least 5\% performance improvement over baseline foundation models while also outperforming alternative parameter-efficient fine-tuning strategies. We will release the code, datasets and aligned models after peer-review to facilitate broader adoption and further development for parameter efficient model alignment.
Related papers
- Instruction-Guided Autoregressive Neural Network Parameter Generation [49.800239140036496]
We propose IGPG, an autoregressive framework that unifies parameter synthesis across diverse tasks and architectures.<n>By autoregressively generating neural network weights' tokens, IGPG ensures inter-layer coherence and enables efficient adaptation across models and datasets.<n>Experiments on multiple datasets demonstrate that IGPG consolidates diverse pretrained models into a single, flexible generative framework.
arXiv Detail & Related papers (2025-04-02T05:50:19Z) - Predicting gene essentiality and drug response from perturbation screens in preclinical cancer models with LEAP: Layered Ensemble of Autoencoders and Predictors [0.0]
We introduce LEAP (Layered Ensemble of Autoencoders and Predictors), a novel ensemble framework to improve robustness and generalization.<n>LEAP consistently outperforms state-of-the-art approaches in predicting gene essentiality or drug responses in unseen cell lines, tissues and disease models.
arXiv Detail & Related papers (2025-02-21T18:12:36Z) - GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters.<n>Trained on an expansive dataset comprising 386B bp of DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks.<n>It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of enhancer sequences with specific activity profiles.
arXiv Detail & Related papers (2025-02-11T05:39:49Z) - MERGE: Multi-faceted Hierarchical Graph-based GNN for Gene Expression Prediction from Whole Slide Histopathology Images [6.717786190771243]
We introduce MERGE (Multi-faceted hiErarchical gRaph for Gene Expressions), which combines a hierarchical graph construction strategy with graph neural networks (GNN) to improve gene expression predictions from whole slide images.<n>By clustering tissue image patches based on both spatial and morphological features, our approach fosters interactions between distant tissue locations during GNN learning.<n>As an additional contribution, we evaluate different data smoothing techniques that are necessary to mitigate artifacts in ST data, often caused by technical imperfections.
arXiv Detail & Related papers (2024-12-03T17:32:05Z) - Integrating Large Language Models for Genetic Variant Classification [12.244115429231888]
Large Language Models (LLMs) have emerged as transformative tools in genetics.
This study investigates the integration of state-of-the-art LLMs, including GPN-MSA, ESM1b, and AlphaMissense.
Our approach evaluates these integrated models using the well-annotated ProteinGym and ClinVar datasets.
arXiv Detail & Related papers (2024-11-07T13:45:56Z) - VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling [60.91599380893732]
VQDNA is a general-purpose framework that renovates genome tokenization from the perspective of genome vocabulary learning.
By leveraging vector-quantized codebooks as learnable vocabulary, VQDNA can adaptively tokenize genomes into pattern-aware embeddings.
arXiv Detail & Related papers (2024-05-13T20:15:03Z) - PathLDM: Text conditioned Latent Diffusion Model for Histopathology [62.970593674481414]
We introduce PathLDM, the first text-conditioned Latent Diffusion Model tailored for generating high-quality histopathology images.
Our approach fuses image and textual data to enhance the generation process.
We achieved a SoTA FID score of 7.64 for text-to-image generation on the TCGA-BRCA dataset, significantly outperforming the closest text-conditioned competitor with FID 30.1.
arXiv Detail & Related papers (2023-09-01T22:08:32Z) - Incorporating Prior Knowledge in Deep Learning Models via Pathway
Activity Autoencoders [5.950889585409067]
We propose a novel prior-knowledge-based deep auto-encoding framework, PAAE, for RNA-seq data in cancer.
We show that, despite having access to a smaller set of features, our PAAE and PAVAE models achieve better out-of-set reconstruction results compared to common methodologies.
arXiv Detail & Related papers (2023-06-09T11:12:55Z) - Unsupervised ensemble-based phenotyping helps enhance the
discoverability of genes related to heart morphology [57.25098075813054]
We propose a new framework for gene discovery entitled Un Phenotype Ensembles.
It builds a redundant yet highly expressive representation by pooling a set of phenotypes learned in an unsupervised manner.
These phenotypes are then analyzed via (GWAS), retaining only highly confident and stable associations.
arXiv Detail & Related papers (2023-01-07T18:36:44Z) - All You Need is Color: Image based Spatial Gene Expression Prediction
using Neural Stain Learning [11.9045433112067]
We propose a "stain-aware" machine learning approach for prediction of spatial transcriptomic gene expression profiles.
We have found that the gene expression predictions from the proposed approach show higher correlations with true expression values obtained through sequencing.
arXiv Detail & Related papers (2021-08-23T23:43:38Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.