DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for
AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise
Annotations
- URL: http://arxiv.org/abs/2201.09637v1
- Date: Mon, 24 Jan 2022 12:32:48 GMT
- Title: DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for
AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise
Annotations
- Authors: Yuanfeng Ji, Lu Zhang, Jiaxiang Wu, Bingzhe Wu, Long-Kai Huang,
Tingyang Xu, Yu Rong, Lanqing Li, Jie Ren, Ding Xue, Houtim Lai, Shaoyong Xu,
Jing Feng, Wei Liu, Ping Luo, Shuigeng Zhou, Junzhou Huang, Peilin Zhao,
Yatao Bian
- Abstract summary: We present DrugOOD, a systematic OOD dataset curator and benchmark for AI-aided drug discovery.
DrugOOD comes with an open-source Python package that fully automates benchmarking processes.
We focus on one of the most crucial problems in AIDD: drug target binding affinity prediction.
- Score: 90.27736364704108
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: AI-aided drug discovery (AIDD) is gaining increasing popularity due to its
promise of making the search for new pharmaceuticals quicker, cheaper and more
efficient. In spite of its extensive use in many fields, such as ADMET
prediction, virtual screening, protein folding and generative chemistry, little
has been explored in terms of the out-of-distribution (OOD) learning problem
with \emph{noise}, which is inevitable in real world AIDD applications.
In this work, we present DrugOOD, a systematic OOD dataset curator and
benchmark for AI-aided drug discovery, which comes with an open-source Python
package that fully automates the data curation and OOD benchmarking processes.
We focus on one of the most crucial problems in AIDD: drug target binding
affinity prediction, which involves both macromolecule (protein target) and
small-molecule (drug compound). In contrast to only providing fixed datasets,
DrugOOD offers automated dataset curator with user-friendly customization
scripts, rich domain annotations aligned with biochemistry knowledge, realistic
noise annotations and rigorous benchmarking of state-of-the-art OOD algorithms.
Since the molecular data is often modeled as irregular graphs using graph
neural network (GNN) backbones, DrugOOD also serves as a valuable testbed for
\emph{graph OOD learning} problems. Extensive empirical studies have shown a
significant performance gap between in-distribution and out-of-distribution
experiments, which highlights the need to develop better schemes that can allow
for OOD generalization under noise for AIDD.
Related papers
- Optimizing OOD Detection in Molecular Graphs: A Novel Approach with Diffusion Models [71.39421638547164]
We propose to detect OOD molecules by adopting an auxiliary diffusion model-based framework, which compares similarities between input molecules and reconstructed graphs.
Due to the generative bias towards reconstructing ID training samples, the similarity scores of OOD molecules will be much lower to facilitate detection.
Our research pioneers an approach of Prototypical Graph Reconstruction for Molecular OOD Detection, dubbed as PGR-MOOD and hinges on three innovations.
arXiv Detail & Related papers (2024-04-24T03:25:53Z) - LINe: Out-of-Distribution Detection by Leveraging Important Neurons [15.797257361788812]
We introduce a new aspect for analyzing the difference in model outputs between in-distribution data and OOD data.
We propose a novel method, Leveraging Important Neurons (LINe), for post-hoc Out of distribution detection.
arXiv Detail & Related papers (2023-03-24T13:49:05Z) - Energy-based Out-of-Distribution Detection for Graph Neural Networks [76.0242218180483]
We propose a simple, powerful and efficient OOD detection model for GNN-based learning on graphs, which we call GNNSafe.
GNNSafe achieves up to $17.0%$ AUROC improvement over state-of-the-arts and it could serve as simple yet strong baselines in such an under-developed area.
arXiv Detail & Related papers (2023-02-06T16:38:43Z) - ImDrug: A Benchmark for Deep Imbalanced Learning in AI-aided Drug
Discovery [79.08833067391093]
Real-world pharmaceutical datasets often exhibit highly imbalanced distribution.
We introduce ImDrug, a benchmark with an open-source Python library which consists of 4 imbalance settings, 11 AI-ready datasets, 54 learning tasks and 16 baseline algorithms tailored for imbalanced learning.
It provides an accessible and customizable testbed for problems and solutions spanning a broad spectrum of the drug discovery pipeline.
arXiv Detail & Related papers (2022-09-16T13:35:57Z) - Augmenting Softmax Information for Selective Classification with
Out-of-Distribution Data [7.221206118679026]
We show that existing post-hoc methods perform quite differently compared to when evaluated only on OOD detection.
We propose a novel method for SCOD, Softmax Information Retaining Combination (SIRC), that augments softmax-based confidence scores with feature-agnostic information.
Experiments on a wide variety of ImageNet-scale datasets and convolutional neural network architectures show that SIRC is able to consistently match or outperform the baseline for SCOD.
arXiv Detail & Related papers (2022-07-15T14:39:57Z) - SSM-DTA: Breaking the Barriers of Data Scarcity in Drug-Target Affinity
Prediction [127.43571146741984]
Drug-Target Affinity (DTA) is of vital importance in early-stage drug discovery.
wet experiments remain the most reliable method, but they are time-consuming and resource-intensive.
Existing methods have primarily focused on developing techniques based on the available DTA data, without adequately addressing the data scarcity issue.
We present the SSM-DTA framework, which incorporates three simple yet highly effective strategies.
arXiv Detail & Related papers (2022-06-20T14:53:25Z) - Training OOD Detectors in their Natural Habitats [31.565635192716712]
Out-of-distribution (OOD) detection is important for machine learning models deployed in the wild.
Recent methods use auxiliary outlier data to regularize the model for improved OOD detection.
We propose a novel framework that leverages wild mixture data -- that naturally consists of both ID and OOD samples.
arXiv Detail & Related papers (2022-02-07T15:38:39Z) - SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge
Graph Summarization [64.56399911605286]
We propose SumGNN: knowledge summarization graph neural network, which is enabled by a subgraph extraction module.
SumGNN outperforms the best baseline by up to 5.54%, and the performance gain is particularly significant in low data relation types.
arXiv Detail & Related papers (2020-10-04T00:14:57Z) - FOOD: Fast Out-Of-Distribution Detector [43.31844129399436]
FOOD is an extended deep neural network (DNN) capable of efficiently detecting OOD samples with minimal inference time overhead.
We evaluate FOOD's detection performance on the SVHN, CIFAR-10, and CIFAR-100 datasets.
Our results demonstrate that in addition to achieving state-of-the-art performance, FOOD is fast and applicable to real-world applications.
arXiv Detail & Related papers (2020-08-16T08:22:43Z) - Drug-Drug Interaction Prediction with Wasserstein Adversarial
Autoencoder-based Knowledge Graph Embeddings [22.562175708415392]
We propose a new knowledge graph embedding framework for drug-drug interactions.
In our framework, the autoencoder is employed to generate high-quality negative samples.
The discriminator learns the embeddings of drugs and interactions based on both positive and negative triplets.
arXiv Detail & Related papers (2020-04-15T21:03:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.