Predicting gene essentiality and drug response from perturbation screens in preclinical cancer models with LEAP: Layered Ensemble of Autoencoders and Predictors
- URL: http://arxiv.org/abs/2502.15646v1
- Date: Fri, 21 Feb 2025 18:12:36 GMT
- Title: Predicting gene essentiality and drug response from perturbation screens in preclinical cancer models with LEAP: Layered Ensemble of Autoencoders and Predictors
- Authors: Barbara Bodinier, Gaetan Dissez, Linus Bleistein, Antonin Dauvin,
- Abstract summary: We introduce LEAP (Layered Ensemble of Autoencoders and Predictors), a novel ensemble framework to improve robustness and generalization.<n>LEAP consistently outperforms state-of-the-art approaches in predicting gene essentiality or drug responses in unseen cell lines, tissues and disease models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Preclinical perturbation screens, where the effects of genetic, chemical, or environmental perturbations are systematically tested on disease models, hold significant promise for machine learning-enhanced drug discovery due to their scale and causal nature. Predictive models can infer perturbation responses for previously untested disease models based on molecular profiles. These in silico labels can expand databases and guide experimental prioritization. However, modelling perturbation-specific effects and generating robust prediction performances across diverse biological contexts remain elusive. We introduce LEAP (Layered Ensemble of Autoencoders and Predictors), a novel ensemble framework to improve robustness and generalization. LEAP leverages multiple DAMAE (Data Augmented Masked Autoencoder) representations and LASSO regressors. By combining diverse gene expression representation models learned from different random initializations, LEAP consistently outperforms state-of-the-art approaches in predicting gene essentiality or drug responses in unseen cell lines, tissues and disease models. Notably, our results show that ensembling representation models, rather than prediction models alone, yields superior predictive performance. Beyond its performance gains, LEAP is computationally efficient, requires minimal hyperparameter tuning and can therefore be readily incorporated into drug discovery pipelines to prioritize promising targets and support biomarker-driven stratification. The code and datasets used in this work are made publicly available.
Related papers
- Teaching pathology foundation models to accurately predict gene expression with parameter efficient knowledge transfer [1.5416321520529301]
Efficient Knowledge Adaptation (PEKA) is a novel framework that integrates knowledge distillation and structure alignment losses for cross-modal knowledge transfer.
We evaluated PEKA for gene expression prediction using multiple spatial transcriptomics datasets.
arXiv Detail & Related papers (2025-04-09T17:24:41Z) - UniGenX: Unified Generation of Sequence and Structure with Autoregressive Diffusion [61.690978792873196]
Existing approaches rely on either autoregressive sequence models or diffusion models.
We propose UniGenX, a unified framework that combines autoregressive next-token prediction with conditional diffusion models.
We validate the effectiveness of UniGenX on material and small molecule generation tasks.
arXiv Detail & Related papers (2025-03-09T16:43:07Z) - GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters.<n>The model adheres to the central dogma of molecular biology, accurately generating protein-coding sequences.<n>It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of promoter sequences.
arXiv Detail & Related papers (2025-02-11T05:39:49Z) - Generative Intervention Models for Causal Perturbation Modeling [80.72074987374141]
In many applications, it is a priori unknown which mechanisms of a system are modified by an external perturbation.
We propose a generative intervention model (GIM) that learns to map these perturbation features to distributions over atomic interventions.
arXiv Detail & Related papers (2024-11-21T10:37:57Z) - Development and Comparative Analysis of Machine Learning Models for Hypoxemia Severity Triage in CBRNE Emergency Scenarios Using Physiological and Demographic Data from Medical-Grade Devices [0.0]
Gradient Boosting Models (GBMs) outperformed sequential models in terms of training speed, interpretability, and reliability.
A 5-minute prediction window was chosen for timely intervention, with minute-levels standardizing the data.
This study highlights ML's potential to improve triage and reduce alarm fatigue.
arXiv Detail & Related papers (2024-10-30T23:24:28Z) - Generating Multi-Modal and Multi-Attribute Single-Cell Counts with CFGen [76.02070962797794]
We present Cell Flow for Generation, a flow-based conditional generative model for multi-modal single-cell counts.
Our results suggest improved recovery of crucial biological data characteristics while accounting for novel generative tasks.
arXiv Detail & Related papers (2024-07-16T14:05:03Z) - Permutation invariant multi-output Gaussian Processes for drug combination prediction in cancer [2.1145050293719745]
Dose-response prediction in cancer is an active application field in machine learning.
The goal is to develop accurate predictive models that can be used to guide experimental design or inform treatment decisions.
arXiv Detail & Related papers (2024-06-28T18:28:38Z) - Predictive Analytics of Varieties of Potatoes [2.336821989135698]
We explore the application of machine learning algorithms specifically to enhance the selection process of Russet potato clones in breeding trials.
This study addresses the challenge of efficiently identifying high-yield, disease-resistant, and climate-resilient potato varieties.
arXiv Detail & Related papers (2024-04-04T00:49:05Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Unsupervised language models for disease variant prediction [3.6942566104432886]
We find that a single protein LM trained on broad sequence datasets can score pathogenicity for any gene variant zero-shot.
We show that it achieves scoring performance comparable to the state of the art when evaluated on clinically labeled variants of disease-related genes.
arXiv Detail & Related papers (2022-12-07T22:28:13Z) - Benchmarking Machine Learning Robustness in Covid-19 Genome Sequence
Classification [109.81283748940696]
We introduce several ways to perturb SARS-CoV-2 genome sequences to mimic the error profiles of common sequencing platforms such as Illumina and PacBio.
We show that some simulation-based approaches are more robust (and accurate) than others for specific embedding methods to certain adversarial attacks to the input sequences.
arXiv Detail & Related papers (2022-07-18T19:16:56Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - Bidirectional Representation Learning from Transformers using Multimodal
Electronic Health Record Data to Predict Depression [11.1492931066686]
We present a temporal deep learning model to perform bidirectional representation learning on EHR sequences to predict depression.
The model generated the highest increases of precision-recall area under the curve (PRAUC) from 0.70 to 0.76 in depression prediction compared to the best baseline model.
arXiv Detail & Related papers (2020-09-26T17:56:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.