Drug Discovery SMILES-to-Pharmacokinetics Diffusion Models with Deep Molecular Understanding
- URL: http://arxiv.org/abs/2408.07636v1
- Date: Wed, 14 Aug 2024 16:01:02 GMT
- Title: Drug Discovery SMILES-to-Pharmacokinetics Diffusion Models with Deep Molecular Understanding
- Authors: Bing Hu, Anita Layton, Helen Chen,
- Abstract summary: Imagand is a novel SMILES-to-Pharmacokinetic (S2PK) diffusion model capable of generating an array of PK target properties conditioned on SMILES inputs.
Imagand is a promising solution for data overlap sparsity and allows researchers to efficiently generate ligand PK data for drug discovery research.
- Score: 1.4952056744888913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Artificial intelligence (AI) is increasingly used in every stage of drug development. One challenge facing drug discovery AI is that drug pharmacokinetic (PK) datasets are often collected independently from each other, often with limited overlap, creating data overlap sparsity. Data sparsity makes data curation difficult for researchers looking to answer research questions in poly-pharmacy, drug combination research, and high-throughput screening. We propose Imagand, a novel SMILES-to-Pharmacokinetic (S2PK) diffusion model capable of generating an array of PK target properties conditioned on SMILES inputs. We show that Imagand-generated synthetic PK data closely resembles real data univariate and bivariate distributions, and improves performance for downstream tasks. Imagand is a promising solution for data overlap sparsity and allows researchers to efficiently generate ligand PK data for drug discovery research. Code is available at \url{https://github.com/bing1100/Imagand}.
Related papers
- Regressor-free Molecule Generation to Support Drug Response Prediction [83.25894107956735]
Conditional generation based on the target IC50 score can obtain a more effective sampling space.
Regressor-free guidance combines a diffusion model's score estimation with a regression controller model's gradient based on number labels.
arXiv Detail & Related papers (2024-05-23T13:22:17Z) - Synthetic Data from Diffusion Models Improve Drug Discovery Prediction [1.3686993145787065]
Data sparsity makes data curation difficult for researchers looking to answer key research questions.
We propose a novel diffusion GNN model Syngand capable of generating ligand and pharmacokinetic data end-to-end.
We show initial promising results on the efficacy of the Syngand-generated synthetic target property data on downstream regression tasks with AqSolDB, LD50, and hERG central.
arXiv Detail & Related papers (2024-05-06T19:09:37Z) - Physical formula enhanced multi-task learning for pharmacokinetics prediction [54.13787789006417]
A major challenge for AI-driven drug discovery is the scarcity of high-quality data.
We develop a formula enhanced mul-ti-task learning (PEMAL) method that predicts four key parameters of pharmacokinetics simultaneously.
Our experiments reveal that PEMAL significantly lowers the data demand, compared to typical Graph Neural Networks.
arXiv Detail & Related papers (2024-04-16T07:42:55Z) - Drug Synergistic Combinations Predictions via Large-Scale Pre-Training
and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation.
Deep learning models have emerged as an efficient way to discover synergistic combinations.
Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z) - ImDrug: A Benchmark for Deep Imbalanced Learning in AI-aided Drug
Discovery [79.08833067391093]
Real-world pharmaceutical datasets often exhibit highly imbalanced distribution.
We introduce ImDrug, a benchmark with an open-source Python library which consists of 4 imbalance settings, 11 AI-ready datasets, 54 learning tasks and 16 baseline algorithms tailored for imbalanced learning.
It provides an accessible and customizable testbed for problems and solutions spanning a broad spectrum of the drug discovery pipeline.
arXiv Detail & Related papers (2022-09-16T13:35:57Z) - HyGNN: Drug-Drug Interaction Prediction via Hypergraph Neural Network [0.0]
Drug-Drug Interactions (DDIs) may hamper the functionalities of drugs, and in the worst scenario, they may lead to adverse drug reactions (ADRs)
This paper proposes a novel Hypergraph Neural Network (HyGNN) model based on only the SMILES string of drugs, available for any drug, for the DDI prediction problem.
Our proposed HyGNN model effectively predicts DDIs and impressively outperforms the baselines with a maximum ROC-AUC and PR-AUC of 97.9% and 98.1%, respectively.
arXiv Detail & Related papers (2022-06-25T22:48:27Z) - SSM-DTA: Breaking the Barriers of Data Scarcity in Drug-Target Affinity
Prediction [127.43571146741984]
Drug-Target Affinity (DTA) is of vital importance in early-stage drug discovery.
wet experiments remain the most reliable method, but they are time-consuming and resource-intensive.
Existing methods have primarily focused on developing techniques based on the available DTA data, without adequately addressing the data scarcity issue.
We present the SSM-DTA framework, which incorporates three simple yet highly effective strategies.
arXiv Detail & Related papers (2022-06-20T14:53:25Z) - DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for
AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise
Annotations [90.27736364704108]
We present DrugOOD, a systematic OOD dataset curator and benchmark for AI-aided drug discovery.
DrugOOD comes with an open-source Python package that fully automates benchmarking processes.
We focus on one of the most crucial problems in AIDD: drug target binding affinity prediction.
arXiv Detail & Related papers (2022-01-24T12:32:48Z) - CODE-AE: A Coherent De-confounding Autoencoder for Predicting
Patient-Specific Drug Response From Cell Line Transcriptomics [35.67979269269178]
We develop a Coherent Deconfounding Autoencoder (CODE-AE) that can extract both common biological signals shared by incoherent samples and private representations unique to each data set.
CODE-AE significantly improves the accuracy and robustness over state-of-the-art methods in both predicting patient drug response and de-confounding biological signals.
arXiv Detail & Related papers (2021-01-31T21:17:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.