In-silico biological discovery with large perturbation models
- URL: http://arxiv.org/abs/2503.23535v1
- Date: Sun, 30 Mar 2025 17:41:25 GMT
- Title: In-silico biological discovery with large perturbation models
- Authors: Djordje Miladinovic, Tobias Höppe, Mathieu Chevalley, Andreas Georgiou, Lachlan Stuart, Arash Mehrjou, Marcus Bantscheff, Bernhard Schölkopf, Patrick Schwab,
- Abstract summary: We present the Large Perturbation Model (LPM), a deep-learning model that integrates perturbation experiments by representing perturbation, readout, and context as disentangled dimensions.<n>LPM outperforms existing methods across multiple biological discovery tasks, including in predicting post-perturbation transcriptomes of unseen experiments.
- Score: 46.388631244976885
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data generated in perturbation experiments link perturbations to the changes they elicit and therefore contain information relevant to numerous biological discovery tasks -- from understanding the relationships between biological entities to developing therapeutics. However, these data encompass diverse perturbations and readouts, and the complex dependence of experimental outcomes on their biological context makes it challenging to integrate insights across experiments. Here, we present the Large Perturbation Model (LPM), a deep-learning model that integrates multiple, heterogeneous perturbation experiments by representing perturbation, readout, and context as disentangled dimensions. LPM outperforms existing methods across multiple biological discovery tasks, including in predicting post-perturbation transcriptomes of unseen experiments, identifying shared molecular mechanisms of action between chemical and genetic perturbations, and facilitating the inference of gene-gene interaction networks.
Related papers
- Contextualizing biological perturbation experiments through language [3.704686482174365]
PerturbQA is a benchmark for structured reasoning over perturbation experiments.<n>We evaluate state-of-the-art machine learning and statistical approaches for modeling perturbations.<n>As a proof of feasibility, we introduce Summer (SUMMarize, retrievE, and answeR), a simple, domain-informed LLM framework.
arXiv Detail & Related papers (2025-02-28T18:15:31Z) - BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning [49.487327661584686]
We introduce BioMaze, a dataset with 5.1K complex pathway problems from real research.
Our evaluation of methods such as CoT and graph-augmented reasoning, shows that LLMs struggle with pathway reasoning.
To address this, we propose PathSeeker, an LLM agent that enhances reasoning through interactive subgraph-based navigation.
arXiv Detail & Related papers (2025-02-23T17:38:10Z) - Causal Representation Learning from Multimodal Biomedical Observations [57.00712157758845]
We develop flexible identification conditions for multimodal data and principled methods to facilitate the understanding of biomedical datasets.<n>Key theoretical contribution is the structural sparsity of causal connections between modalities.<n>Results on a real-world human phenotype dataset are consistent with established biomedical research.
arXiv Detail & Related papers (2024-11-10T16:40:27Z) - Automated Discovery of Pairwise Interactions from Unstructured Data [3.980555701211573]
Pairwise perturbation experiments are frequently used to reveal interactions that are not observable from any single perturbation.
We show how these tests can be integrated into an active learning pipeline to efficiently discover pairwise interactions between perturbations.
We validate our approach on a real biological experiment where we knocked out 50 pairs of genes and measured the effect with microscopy images.
arXiv Detail & Related papers (2024-09-11T19:53:50Z) - Optimal Transport for Latent Integration with An Application to Heterogeneous Neuronal Activity Data [1.5311478638611091]
We propose a novel heterogeneous data integration framework based on optimal transport to extract shared patterns in complex biological processes.
Our approach is effective even with a small number of subjects, and does not require auxiliary matching information for the alignment.
arXiv Detail & Related papers (2024-06-27T04:29:21Z) - BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments [112.25067497985447]
We introduce BioDiscoveryAgent, an agent that designs new experiments, reasons about their outcomes, and efficiently navigates the hypothesis space to reach desired solutions.
BioDiscoveryAgent can uniquely design new experiments without the need to train a machine learning model.
It achieves an average of 21% improvement in predicting relevant genetic perturbations across six datasets.
arXiv Detail & Related papers (2024-05-27T19:57:17Z) - DiscoBAX: Discovery of Optimal Intervention Sets in Genomic Experiment
Design [61.48963555382729]
We propose DiscoBAX as a sample-efficient method for maximizing the rate of significant discoveries per experiment.
We provide theoretical guarantees of approximate optimality under standard assumptions, and conduct a comprehensive experimental evaluation.
arXiv Detail & Related papers (2023-12-07T06:05:39Z) - Causal machine learning for single-cell genomics [94.28105176231739]
We discuss the application of machine learning techniques to single-cell genomics and their challenges.
We first present the model that underlies most of current causal approaches to single-cell biology.
We then identify open problems in the application of causal approaches to single-cell data.
arXiv Detail & Related papers (2023-10-23T13:35:24Z) - Comparative Performance Evaluation of Large Language Models for Extracting Molecular Interactions and Pathway Knowledge [5.893494985371817]
Large language models (LLMs) can identify genes/proteins associated with pathways of interest.
LLMs can identify genes/proteins associated with pathways of interest and predict their interactions to a certain extent.
arXiv Detail & Related papers (2023-07-17T20:01:11Z) - Decentralized policy learning with partial observation and mechanical
constraints for multiperson modeling [14.00358511581803]
We propose sequential generative models with partial observation and mechanical constraints in a decentralized manner.
Our approach can be used as a multi-agent simulator to generate realistic trajectories using real-world data.
arXiv Detail & Related papers (2020-07-07T01:24:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.