ImDrug: A Benchmark for Deep Imbalanced Learning in AI-aided Drug
Discovery
- URL: http://arxiv.org/abs/2209.07921v1
- Date: Fri, 16 Sep 2022 13:35:57 GMT
- Title: ImDrug: A Benchmark for Deep Imbalanced Learning in AI-aided Drug
Discovery
- Authors: Lanqing Li, Liang Zeng, Ziqi Gao, Shen Yuan, Yatao Bian, Bingzhe Wu,
Hengtong Zhang, Chan Lu, Yang Yu, Wei Liu, Hongteng Xu, Jia Li, Peilin Zhao,
Pheng-Ann Heng
- Abstract summary: Real-world pharmaceutical datasets often exhibit highly imbalanced distribution.
We introduce ImDrug, a benchmark with an open-source Python library which consists of 4 imbalance settings, 11 AI-ready datasets, 54 learning tasks and 16 baseline algorithms tailored for imbalanced learning.
It provides an accessible and customizable testbed for problems and solutions spanning a broad spectrum of the drug discovery pipeline.
- Score: 79.08833067391093
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The last decade has witnessed a prosperous development of computational
methods and dataset curation for AI-aided drug discovery (AIDD). However,
real-world pharmaceutical datasets often exhibit highly imbalanced
distribution, which is largely overlooked by the current literature but may
severely compromise the fairness and generalization of machine learning
applications. Motivated by this observation, we introduce ImDrug, a
comprehensive benchmark with an open-source Python library which consists of 4
imbalance settings, 11 AI-ready datasets, 54 learning tasks and 16 baseline
algorithms tailored for imbalanced learning. It provides an accessible and
customizable testbed for problems and solutions spanning a broad spectrum of
the drug discovery pipeline such as molecular modeling, drug-target interaction
and retrosynthesis. We conduct extensive empirical studies with novel
evaluation metrics, to demonstrate that the existing algorithms fall short of
solving medicinal and pharmaceutical challenges in the data imbalance scenario.
We believe that ImDrug opens up avenues for future research and development, on
real-world challenges at the intersection of AIDD and deep imbalanced learning.
Related papers
- Drug Discovery SMILES-to-Pharmacokinetics Diffusion Models with Deep Molecular Understanding [1.4952056744888913]
Imagand is a novel SMILES-to-Pharmacokinetic (S2PK) diffusion model capable of generating an array of PK target properties conditioned on SMILES inputs.
Imagand is a promising solution for data overlap sparsity and allows researchers to efficiently generate ligand PK data for drug discovery research.
arXiv Detail & Related papers (2024-08-14T16:01:02Z) - Synthetic Data from Diffusion Models Improve Drug Discovery Prediction [1.3686993145787065]
Data sparsity makes data curation difficult for researchers looking to answer key research questions.
We propose a novel diffusion GNN model Syngand capable of generating ligand and pharmacokinetic data end-to-end.
We show initial promising results on the efficacy of the Syngand-generated synthetic target property data on downstream regression tasks with AqSolDB, LD50, and hERG central.
arXiv Detail & Related papers (2024-05-06T19:09:37Z) - Comprehensive evaluation of deep and graph learning on drug-drug
interactions prediction [43.5957881547028]
Recent advances in artificial intelligence (AI) as well as deep and graph learning models have established their usefulness in biomedical applications.
DDIs refer to a change in the effect of one drug to the presence of another drug in the human body.
To correctly apply the advanced AI and deep learning, the developer and user meet various challenges such as the availability and encoding of data resources.
arXiv Detail & Related papers (2023-06-08T14:54:50Z) - Knowledge-augmented Graph Machine Learning for Drug Discovery: A Survey [6.288056740658763]
Graph Machine Learning (GML) has gained considerable attention for its exceptional ability to model graph-structured biomedical data.
Recent studies have proposed integrating external biomedical knowledge into the GML pipeline to realise more precise and interpretable drug discovery.
arXiv Detail & Related papers (2023-02-16T12:38:01Z) - Drug Synergistic Combinations Predictions via Large-Scale Pre-Training
and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation.
Deep learning models have emerged as an efficient way to discover synergistic combinations.
Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z) - Predicting Seriousness of Injury in a Traffic Accident: A New Imbalanced
Dataset and Benchmark [62.997667081978825]
The paper introduces a new dataset to assess the performance of machine learning algorithms in the prediction of the seriousness of injury in a traffic accident.
The dataset is created by aggregating publicly available datasets from the UK Department for Transport.
arXiv Detail & Related papers (2022-05-20T21:15:26Z) - Deep learning for drug repurposing: methods, databases, and applications [54.08583498324774]
Repurposing existing drugs for new therapies is an attractive solution that accelerates drug development at reduced experimental costs.
In this review, we introduce guidelines on how to utilize deep learning methodologies and tools for drug repurposing.
arXiv Detail & Related papers (2022-02-08T09:42:08Z) - DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for
AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise
Annotations [90.27736364704108]
We present DrugOOD, a systematic OOD dataset curator and benchmark for AI-aided drug discovery.
DrugOOD comes with an open-source Python package that fully automates benchmarking processes.
We focus on one of the most crucial problems in AIDD: drug target binding affinity prediction.
arXiv Detail & Related papers (2022-01-24T12:32:48Z) - Dropout: Explicit Forms and Capacity Control [57.36692251815882]
We investigate capacity control provided by dropout in various machine learning problems.
In deep learning, we show that the data-dependent regularizer due to dropout directly controls the Rademacher complexity of the underlying class of deep neural networks.
We evaluate our theoretical findings on real-world datasets, including MovieLens, MNIST, and Fashion-MNIST.
arXiv Detail & Related papers (2020-03-06T19:10:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.