Predicting gene essentiality and drug response from perturbation screens in preclinical cancer models with LEAP: Layered Ensemble of Autoencoders and Predictors
- URL: http://arxiv.org/abs/2502.15646v2
- Date: Fri, 17 Oct 2025 15:47:39 GMT
- Title: Predicting gene essentiality and drug response from perturbation screens in preclinical cancer models with LEAP: Layered Ensemble of Autoencoders and Predictors
- Authors: Barbara Bodinier, Gaetan Dissez, Lucile Ter-Minassian, Linus Bleistein, Roberta Codato, John Klein, Eric Durand, Antonin Dauvin,
- Abstract summary: Existing predictive models suffer from limited, generalisability and interpretability.<n>We introduce a framework of Layered Ensemble of Autoencoders and Predictors (LEAP)<n>LEAP consistently improves prediction performances in unscreened cell lines across modelling strategies.
- Score: 4.882734501598445
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High-throughput preclinical perturbation screens, where the effects of genetic, chemical, or environmental perturbations are systematically tested on disease models, hold significant promise for machine learning-enhanced drug discovery due to their scale and causal nature. Predictive models trained on such datasets can be used to (i) infer perturbation response for previously untested disease models, and (ii) characterise the biological context that affects perturbation response. Existing predictive models suffer from limited reproducibility, generalisability and interpretability. To address these issues, we introduce a framework of Layered Ensemble of Autoencoders and Predictors (LEAP), a general and flexible ensemble strategy to aggregate predictions from multiple regressors trained using diverse gene expression representation models. LEAP consistently improves prediction performances in unscreened cell lines across modelling strategies. In particular, LEAP applied to perturbation-specific LASSO regressors (PS-LASSO) provides a favorable balance between near state-of-the-art performance and low computation time. We also propose an interpretability approach combining model distillation and stability selection to identify important biological pathways for perturbation response prediction in LEAP. Our models have the potential to accelerate the drug discovery pipeline by guiding the prioritisation of preclinical experiments and providing insights into the biological mechanisms involved in perturbation response. The code and datasets used in this work are publicly available.
Related papers
- Investigating the Impact of Histopathological Foundation Models on Regressive Prediction of Homologous Recombination Deficiency [52.50039435394964]
We systematically evaluate foundation models for regression-based tasks.<n>We extract patch-level features from whole slide images (WSI) using five state-of-the-art foundation models.<n>Models are trained to predict continuous HRD scores based on these extracted features across breast, endometrial, and lung cancer cohorts.
arXiv Detail & Related papers (2026-01-29T14:06:50Z) - Latent Causal Diffusions for Single-Cell Perturbation Modeling [83.47931153555321]
We present a generative model that frames single-cell gene expression as a stationary diffusion process observed under measurement noise.<n> LCD outperforms established approaches in predicting the distributional shifts of unseen perturbation combinations in single-cell RNA-sequencing screens.<n>We develop an approach we call causal linearization via perturbation responses (CLIPR), which yields an approximation of the direct causal effects between all genes.
arXiv Detail & Related papers (2026-01-20T16:15:38Z) - Amortized In-Context Mixed Effect Transformer Models: A Zero-Shot Approach for Pharmacokinetics [0.0]
We present the Amortized In-Context Mixed-Effect Transformer (AICMET) model.<n>It unifies mechanistic compartmental priors with amortized in-context Bayesian inference.<n>Experiments show that AICMET attains state-of-the-art predictive accuracy and faithfully quantifies inter-patient variability.
arXiv Detail & Related papers (2025-08-21T15:45:17Z) - Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models [68.57424628540907]
Large language models (LLMs) often develop learned mechanisms specialized to specific datasets.<n>We introduce a fine-tuning approach designed to enhance generalization by identifying and pruning neurons associated with dataset-specific mechanisms.<n>Our method employs Integrated Gradients to quantify each neuron's influence on high-confidence predictions, pinpointing those that disproportionately contribute to dataset-specific performance.
arXiv Detail & Related papers (2025-07-12T08:10:10Z) - Predictive Causal Inference via Spatio-Temporal Modeling and Penalized Empirical Likelihood [0.0]
This study introduces an integrated framework for predictive causal inference designed to overcome limitations in conventional single model approaches.<n> Specifically, we combine a Hidden Markov Model for spatial health state estimation with a Multi Task and Multi Graph Convolutional Network (MTGCN) for capturing temporal outcome trajectories.<n>To demonstrate its utility, we focus on clinical domains such as cancer, dementia, Parkinson disease, where treatment effects are challenging to observe directly.
arXiv Detail & Related papers (2025-07-11T03:11:15Z) - Modeling Gene Expression Distributional Shifts for Unseen Genetic Perturbations [44.619690829431214]
We train a neural network to predict distributional responses in gene expression following genetic perturbations.<n>Our model predicts gene-level histograms conditioned on perturbations and outperforms baselines in capturing higher-order statistics.
arXiv Detail & Related papers (2025-07-01T06:04:28Z) - TxPert: Leveraging Biochemical Relationships for Out-of-Distribution Transcriptomic Perturbation Prediction [11.083533122552396]
We present TxPert, a new state-of-the-art method that leverages multiple biological knowledge networks to predict responses under OOD scenarios.<n>In particular, we present: (i) TxPert, a new state-of-the-art method that leverages multiple biological knowledge networks to predict responses under OOD scenarios; (ii) an in-depth analysis demonstrating the impact of graphs, model architecture, and data on performance; and (iii) an expanded benchmarking framework that strengthens evaluation standards for perturbation modeling.
arXiv Detail & Related papers (2025-05-20T21:13:23Z) - Teaching pathology foundation models to accurately predict gene expression with parameter efficient knowledge transfer [1.5416321520529301]
Efficient Knowledge Adaptation (PEKA) is a novel framework that integrates knowledge distillation and structure alignment losses for cross-modal knowledge transfer.
We evaluated PEKA for gene expression prediction using multiple spatial transcriptomics datasets.
arXiv Detail & Related papers (2025-04-09T17:24:41Z) - UniGenX: Unified Generation of Sequence and Structure with Autoregressive Diffusion [61.690978792873196]
Existing approaches rely on either autoregressive sequence models or diffusion models.
We propose UniGenX, a unified framework that combines autoregressive next-token prediction with conditional diffusion models.
We validate the effectiveness of UniGenX on material and small molecule generation tasks.
arXiv Detail & Related papers (2025-03-09T16:43:07Z) - GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters.<n>The model adheres to the central dogma of molecular biology, accurately generating protein-coding sequences.<n>It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of promoter sequences.
arXiv Detail & Related papers (2025-02-11T05:39:49Z) - Generative Intervention Models for Causal Perturbation Modeling [80.72074987374141]
In many applications, it is a priori unknown which mechanisms of a system are modified by an external perturbation.
We propose a generative intervention model (GIM) that learns to map these perturbation features to distributions over atomic interventions.
arXiv Detail & Related papers (2024-11-21T10:37:57Z) - Development and Comparative Analysis of Machine Learning Models for Hypoxemia Severity Triage in CBRNE Emergency Scenarios Using Physiological and Demographic Data from Medical-Grade Devices [0.0]
Gradient Boosting Models (GBMs) outperformed sequential models in terms of training speed, interpretability, and reliability.
A 5-minute prediction window was chosen for timely intervention, with minute-levels standardizing the data.
This study highlights ML's potential to improve triage and reduce alarm fatigue.
arXiv Detail & Related papers (2024-10-30T23:24:28Z) - Generating Multi-Modal and Multi-Attribute Single-Cell Counts with CFGen [76.02070962797794]
We present Cell Flow for Generation, a flow-based conditional generative model for multi-modal single-cell counts.
Our results suggest improved recovery of crucial biological data characteristics while accounting for novel generative tasks.
arXiv Detail & Related papers (2024-07-16T14:05:03Z) - Permutation invariant multi-output Gaussian Processes for drug combination prediction in cancer [2.1145050293719745]
Dose-response prediction in cancer is an active application field in machine learning.
The goal is to develop accurate predictive models that can be used to guide experimental design or inform treatment decisions.
arXiv Detail & Related papers (2024-06-28T18:28:38Z) - Predictive Analytics of Varieties of Potatoes [2.336821989135698]
We explore the application of machine learning algorithms specifically to enhance the selection process of Russet potato clones in breeding trials.
This study addresses the challenge of efficiently identifying high-yield, disease-resistant, and climate-resilient potato varieties.
arXiv Detail & Related papers (2024-04-04T00:49:05Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Unsupervised language models for disease variant prediction [3.6942566104432886]
We find that a single protein LM trained on broad sequence datasets can score pathogenicity for any gene variant zero-shot.
We show that it achieves scoring performance comparable to the state of the art when evaluated on clinically labeled variants of disease-related genes.
arXiv Detail & Related papers (2022-12-07T22:28:13Z) - Benchmarking Machine Learning Robustness in Covid-19 Genome Sequence
Classification [109.81283748940696]
We introduce several ways to perturb SARS-CoV-2 genome sequences to mimic the error profiles of common sequencing platforms such as Illumina and PacBio.
We show that some simulation-based approaches are more robust (and accurate) than others for specific embedding methods to certain adversarial attacks to the input sequences.
arXiv Detail & Related papers (2022-07-18T19:16:56Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - Bidirectional Representation Learning from Transformers using Multimodal
Electronic Health Record Data to Predict Depression [11.1492931066686]
We present a temporal deep learning model to perform bidirectional representation learning on EHR sequences to predict depression.
The model generated the highest increases of precision-recall area under the curve (PRAUC) from 0.70 to 0.76 in depression prediction compared to the best baseline model.
arXiv Detail & Related papers (2020-09-26T17:56:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.