Active learning for efficient discovery of optimal gene combinations in the combinatorial perturbation space
- URL: http://arxiv.org/abs/2411.12010v2
- Date: Wed, 11 Dec 2024 11:52:24 GMT
- Title: Active learning for efficient discovery of optimal gene combinations in the combinatorial perturbation space
- Authors: Jason Qin, Hans-Hermann Wessels, Carlos Fernandez-Granda, Yuhan Hao,
- Abstract summary: NAIAD is an active learning framework that efficiently discovers optimal gene pairs.
evaluated on four CRISPR perturbation datasets totaling over 350,000 genetic interactions.
Our framework improves the identification of novel, effective gene combinations, enabling more efficient CRISPR library design.
- Score: 13.409130886920579
- License:
- Abstract: The advancement of novel combinatorial CRISPR screening technologies enables the identification of synergistic gene combinations on a large scale. This is crucial for developing novel and effective combination therapies, but the combinatorial space makes exhaustive experimentation infeasible. We introduce NAIAD, an active learning framework that efficiently discovers optimal gene pairs capable of driving cells toward desired cellular phenotypes. NAIAD leverages single-gene perturbation effects and adaptive gene embeddings that scale with the training data size, mitigating overfitting in small-sample learning while capturing complex gene interactions as more data is collected. Evaluated on four CRISPR combinatorial perturbation datasets totaling over 350,000 genetic interactions, NAIAD, trained on small datasets, outperforms existing models by up to 40\% relative to the second-best. NAIAD's recommendation system prioritizes gene pairs with the maximum predicted effects, resulting in the highest marginal gain in each AI-experiment round and accelerating discovery with fewer CRISPR experimental iterations. Our NAIAD framework (https://github.com/NeptuneBio/NAIAD) improves the identification of novel, effective gene combinations, enabling more efficient CRISPR library design and offering promising applications in genomics research and therapeutic development.
Related papers
- Deep Active Learning based Experimental Design to Uncover Synergistic Genetic Interactions for Host Targeted Therapeutics [4.247749070215763]
We present an integrated Deep Active Learning framework that incorporates information from a biological knowledge graph.
The framework is able to generate task-specific representations of genes while also balancing the exploration-exploitation trade-off to pinpoint highly effective double-knockdown pairs.
This is the first work to show promising results on double-gene knockdown experimental data of appreciable scale.
arXiv Detail & Related papers (2025-02-03T03:03:21Z) - Weighted Diversified Sampling for Efficient Data-Driven Single-Cell Gene-Gene Interaction Discovery [56.622854875204645]
We present an innovative approach utilizing data-driven computational tools, leveraging an advanced Transformer model, to unearth gene-gene interactions.
A novel weighted diversified sampling algorithm computes the diversity score of each data sample in just two passes of the dataset.
arXiv Detail & Related papers (2024-10-21T03:35:23Z) - Weakly Supervised Set-Consistency Learning Improves Morphological Profiling of Single-Cell Images [0.6491172192043603]
We propose a set-level consistency learning algorithm, Set-DINO, to improve learned representations of perturbation effects in single-cell images.
We conduct experiments on a large-scale Optical Pooled Screening dataset with more than 5000 genetic perturbations.
arXiv Detail & Related papers (2024-06-08T00:53:30Z) - CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments [51.41735920759667]
Large Language Models (LLMs) have shown promise in various tasks, but they often lack specific knowledge and struggle to accurately solve biological design problems.
In this work, we introduce CRISPR-GPT, an LLM agent augmented with domain knowledge and external tools to automate and enhance the design process of CRISPR-based gene-editing experiments.
arXiv Detail & Related papers (2024-04-27T22:59:17Z) - Exhaustive Exploitation of Nature-inspired Computation for Cancer Screening in an Ensemble Manner [20.07173196364489]
This study presents a framework termed Evolutionary Optimized Diverse Ensemble Learning (EODE) to improve ensemble learning for cancer classification from gene expression data.
Experiments were conducted across 35 gene expression benchmark datasets encompassing varied cancer types.
arXiv Detail & Related papers (2024-04-06T08:07:48Z) - CRISPR: Ensemble Model [20.268713698436326]
We propose a novel ensemble learning method for sgRNA design that is accurate and generalizable.
Our results suggest that our method can be used to design sgRNAs with high sensitivity and specificity, even for new genes or cells.
arXiv Detail & Related papers (2024-03-05T14:55:14Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Incomplete Multimodal Learning for Complex Brain Disorders Prediction [65.95783479249745]
We propose a new incomplete multimodal data integration approach that employs transformers and generative adversarial networks.
We apply our new method to predict cognitive degeneration and disease outcomes using the multimodal imaging genetic data from Alzheimer's Disease Neuroimaging Initiative cohort.
arXiv Detail & Related papers (2023-05-25T16:29:16Z) - Drug Synergistic Combinations Predictions via Large-Scale Pre-Training
and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation.
Deep learning models have emerged as an efficient way to discover synergistic combinations.
Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z) - Unsupervised ensemble-based phenotyping helps enhance the
discoverability of genes related to heart morphology [57.25098075813054]
We propose a new framework for gene discovery entitled Un Phenotype Ensembles.
It builds a redundant yet highly expressive representation by pooling a set of phenotypes learned in an unsupervised manner.
These phenotypes are then analyzed via (GWAS), retaining only highly confident and stable associations.
arXiv Detail & Related papers (2023-01-07T18:36:44Z) - Neural Design for Genetic Perturbation Experiments [16.95249173404529]
We introduce the Optimistic Arm Elimination principle to find an almost optimal arm under different functional relationships between the queries (arms) and the outputs (rewards)
OAE also outperforms the benchmark algorithms in 3 of 4 datasets in the GeneDisco experimental planning challenge.
arXiv Detail & Related papers (2022-07-26T10:59:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.