Related papers: GeneDisco: A Benchmark for Experimental Design in Drug Discovery

GeneDisco: A Benchmark for Experimental Design in Drug Discovery

URL: http://arxiv.org/abs/2110.11875v1
Date: Fri, 22 Oct 2021 16:01:39 GMT
Title: GeneDisco: A Benchmark for Experimental Design in Drug Discovery
Authors: Arash Mehrjou, Ashkan Soleymani, Andrew Jesson, Pascal Notin, Yarin Gal, Stefan Bauer, Patrick Schwab
Abstract summary: In vitro cellular experimentation with genetic interventions is an essential step in early-stage drug discovery. GeneDisco is a benchmark suite for evaluating active learning algorithms for experimental design in drug discovery.
Score: 41.6425999218259
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In vitro cellular experimentation with genetic interventions, using for example CRISPR technologies, is an essential step in early-stage drug discovery and target validation that serves to assess initial hypotheses about causal associations between biological mechanisms and disease pathologies. With billions of potential hypotheses to test, the experimental design space for in vitro genetic experiments is extremely vast, and the available experimental capacity - even at the largest research institutions in the world - pales in relation to the size of this biological hypothesis space. Machine learning methods, such as active and reinforcement learning, could aid in optimally exploring the vast biological space by integrating prior knowledge from various information sources as well as extrapolating to yet unexplored areas of the experimental design space based on available data. However, there exist no standardised benchmarks and data sets for this challenging task and little research has been conducted in this area to date. Here, we introduce GeneDisco, a benchmark suite for evaluating active learning algorithms for experimental design in drug discovery. GeneDisco contains a curated set of multiple publicly available experimental data sets as well as open-source implementations of state-of-the-art active learning policies for experimental design and exploration.

Related papers

MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback [128.2992631982687]
We introduce the task of experiment-guided ranking, which aims to prioritize candidate hypotheses based on the results of previously tested ones.<n>We propose a simulator grounded in three domain-informed assumptions, modeling hypothesis performance as a function of similarity to a known ground truth hypothesis.<n>We curate a dataset of 124 chemistry hypotheses with experimentally reported outcomes to validate the simulator.
arXiv Detail & Related papers (2025-05-23T13:24:50Z)
Causal Representation Learning from Multimodal Biological Observations [57.00712157758845]
We aim to develop flexible identification conditions for multimodal data. We establish identifiability guarantees for each latent component, extending the subspace identification results from prior work. Our key theoretical ingredient is the structural sparsity of the causal connections among distinct modalities.
arXiv Detail & Related papers (2024-11-10T16:40:27Z)
Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation [15.495976478018264]
Large language models (LLMs) have emerged as a promising tool to revolutionize knowledge interaction. We construct a dataset of background-hypothesis pairs from biomedical literature, partitioned into training, seen, and unseen test sets. We assess the hypothesis generation capabilities of top-tier instructed models in zero-shot, few-shot, and fine-tuning settings.
arXiv Detail & Related papers (2024-07-12T02:55:13Z)
BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments [112.25067497985447]
We introduce BioDiscoveryAgent, an agent that designs new experiments, reasons about their outcomes, and efficiently navigates the hypothesis space to reach desired solutions. BioDiscoveryAgent can uniquely design new experiments without the need to train a machine learning model. It achieves an average of 21% improvement in predicting relevant genetic perturbations across six datasets.
arXiv Detail & Related papers (2024-05-27T19:57:17Z)
Seeing Unseen: Discover Novel Biomedical Concepts via Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues. We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space. A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z)
A large dataset curation and benchmark for drug target interaction [0.7699646945563469]
Bioactivity data plays a key role in drug discovery and repurposing. We propose a way to standardize and represent efficiently a very large dataset curated from multiple public sources.
arXiv Detail & Related papers (2024-01-30T17:06:25Z)
DiscoBAX: Discovery of Optimal Intervention Sets in Genomic Experiment Design [61.48963555382729]
We propose DiscoBAX as a sample-efficient method for maximizing the rate of significant discoveries per experiment. We provide theoretical guarantees of approximate optimality under standard assumptions, and conduct a comprehensive experimental evaluation.
arXiv Detail & Related papers (2023-12-07T06:05:39Z)
Pitfalls in Experiments with DNN4SE: An Analysis of the State of the Practice [0.7614628596146599]
We conduct a mapping study, examining 194 experiments with techniques that rely on deep neural networks appearing in 55 papers published in premier software engineering venues. Our study reveals that most of the experiments, including those that have received ACM artifact badges, have fundamental limitations that raise doubts about the reliability of their findings.
arXiv Detail & Related papers (2023-05-19T09:55:48Z)
GFlowNets for AI-Driven Scientific Discovery [74.27219800878304]
We present a new probabilistic machine learning framework called GFlowNets. GFlowNets can be applied in the modeling, hypotheses generation and experimental design stages of the experimental science loop. We argue that GFlowNets can become a valuable tool for AI-driven scientific discovery.
arXiv Detail & Related papers (2023-02-01T17:29:43Z)
Targeted active learning for probabilistic models [8.615625517708324]
A fundamental task in science is to design experiments that yield valuable insights about the system under study. We present PDBAL, a targeted active learning method that adaptively designs experiments to maximize scientific utility.
arXiv Detail & Related papers (2022-10-21T17:22:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.