Partial Product Aware Machine Learning on DNA-Encoded Libraries
- URL: http://arxiv.org/abs/2205.08020v1
- Date: Mon, 16 May 2022 23:18:02 GMT
- Title: Partial Product Aware Machine Learning on DNA-Encoded Libraries
- Authors: Polina Binder, Meghan Lawler, LaShadric Grady, Neil Carlson, Sumudu
Leelananda, Svetlana Belyanskaya, Joe Franklin, Nicolas Tilmans, Henri
Palacci
- Abstract summary: Training machine learning models on DEL data has been shown to be effective at predicting molecules of interest dissimilar from those in the original DEL.
We leverage reaction yield data to enumerate the set of possible molecules corresponding to a given DNA tag.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: DNA encoded libraries (DELs) are used for rapid large-scale screening of
small molecules against a protein target. These combinatorial libraries are
built through several cycles of chemistry and DNA ligation, producing large
sets of DNA-tagged molecules. Training machine learning models on DEL data has
been shown to be effective at predicting molecules of interest dissimilar from
those in the original DEL. Machine learning chemical property prediction
approaches rely on the assumption that the property of interest is linked to a
single chemical structure. In the context of DNA-encoded libraries, this is
equivalent to assuming that every chemical reaction fully yields the desired
product. However, in practice, multi-step chemical synthesis sometimes
generates partial molecules. Each unique DNA tag in a DEL therefore corresponds
to a set of possible molecules. Here, we leverage reaction yield data to
enumerate the set of possible molecules corresponding to a given DNA tag. This
paper demonstrates that training a custom GNN on this richer dataset improves
accuracy and generalization performance.
Related papers
- KinDEL: DNA-Encoded Library Dataset for Kinase Inhibitors [2.0179908661487986]
We present KinDEL, one of the first large, publicly available DEL datasets on two kinases.
We benchmark different machine learning techniques to develop predictive models for hit identification.
We provide biophysical assay data, both on- and off-DNA, to validate our models on a smaller subset of molecules.
arXiv Detail & Related papers (2024-10-11T16:03:58Z) - Compositional Deep Probabilistic Models of DNA Encoded Libraries [6.206196935093064]
We introduce a compositional deep probabilistic model of DEL data, DEL-Compose, which decomposes molecular representations into their mono-synthon, di-synthon, and tri-synthon building blocks.
Our model demonstrates strong performance compared to count baselines, enriches the correct pharmacophores, and offers valuable insights via its intrinsic interpretable structure.
arXiv Detail & Related papers (2023-10-20T19:04:28Z) - Bi-level Contrastive Learning for Knowledge-Enhanced Molecule
Representations [55.42602325017405]
We propose a novel method called GODE, which takes into account the two-level structure of individual molecules.
By pre-training two graph neural networks (GNNs) on different graph structures, combined with contrastive learning, GODE fuses molecular structures with their corresponding knowledge graph substructures.
When fine-tuned across 11 chemical property tasks, our model outperforms existing benchmarks, registering an average ROC-AUC uplift of 13.8% for classification tasks and an average RMSE/MAE enhancement of 35.1% for regression tasks.
arXiv Detail & Related papers (2023-06-02T15:49:45Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Graph-based Molecular Representation Learning [59.06193431883431]
Molecular representation learning (MRL) is a key step to build the connection between machine learning and chemical science.
Recently, MRL has achieved considerable progress, especially in methods based on deep molecular graph learning.
arXiv Detail & Related papers (2022-07-08T17:43:20Z) - Scalable Fragment-Based 3D Molecular Design with Reinforcement Learning [68.8204255655161]
We introduce a novel framework for scalable 3D design that uses a hierarchical agent to build molecules.
In a variety of experiments, we show that our agent, guided only by energy considerations, can efficiently learn to produce molecules with over 100 atoms.
arXiv Detail & Related papers (2022-02-01T18:54:24Z) - Chemical-Reaction-Aware Molecule Representation Learning [88.79052749877334]
We propose using chemical reactions to assist learning molecule representation.
Our approach is proven effective to 1) keep the embedding space well-organized and 2) improve the generalization ability of molecule embeddings.
Experimental results demonstrate that our method achieves state-of-the-art performance in a variety of downstream tasks.
arXiv Detail & Related papers (2021-09-21T00:08:43Z) - MolCLR: Molecular Contrastive Learning of Representations via Graph
Neural Networks [11.994553575596228]
MolCLR is a self-supervised learning framework for large unlabeled molecule datasets.
We propose three novel molecule graph augmentations: atom masking, bond deletion, and subgraph removal.
Our method achieves state-of-the-art performance on many challenging datasets.
arXiv Detail & Related papers (2021-02-19T17:35:18Z) - Barking up the right tree: an approach to search over molecule synthesis
DAGs [28.13323960125482]
Current deep generative models for molecules ignore synthesizability.
We propose a deep generative model that better represents the real world process.
We show that our approach is able to model chemical space well, producing a wide range of diverse molecules.
arXiv Detail & Related papers (2020-12-21T17:35:06Z) - RetroGNN: Approximating Retrosynthesis by Graph Neural Networks for De
Novo Drug Design [75.14290780116002]
We train deep graph neural networks to approximate the outputs of a retrosynthesis planning software.
Our approach finds molecules predicted to be more likely to be antibiotics while maintaining good drug-like properties and being easily synthesizable.
arXiv Detail & Related papers (2020-11-25T22:04:16Z) - Machine learning on DNA-encoded libraries: A new paradigm for
hit-finding [4.473676566828977]
We demonstrate a new approach applying machine learning to DEL selection data.
We train models using only DEL selection data and apply automated or automatable filters.
The approach is effective, with an overall hit rate of sim30% at 30 textmuM and discovery of potent compounds (IC50 10 nM) for every target.
arXiv Detail & Related papers (2020-01-31T19:31:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.