Current Methods for Drug Property Prediction in the Real World
- URL: http://arxiv.org/abs/2309.17161v1
- Date: Tue, 25 Jul 2023 17:50:05 GMT
- Title: Current Methods for Drug Property Prediction in the Real World
- Authors: Jacob Green, Cecilia Cabrera Diaz, Maximilian A. H. Jakobs, Andrea
Dimitracopoulos, Mark van der Wilk, Ryan D. Greenhalgh
- Abstract summary: Predicting drug properties is key in drug discovery to enable de-risking of assets before expensive clinical trials.
It remains unclear for practitioners which method or approach is most suitable, as different papers benchmark on different datasets and methods.
Our large-scale empirical study links together numerous earlier works on different datasets and methods.
We discover that the best method depends on the dataset, and that engineered features with classical ML methods often outperform deep learning.
- Score: 9.061842820405486
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predicting drug properties is key in drug discovery to enable de-risking of
assets before expensive clinical trials, and to find highly active compounds
faster. Interest from the Machine Learning community has led to the release of
a variety of benchmark datasets and proposed methods. However, it remains
unclear for practitioners which method or approach is most suitable, as
different papers benchmark on different datasets and methods, leading to
varying conclusions that are not easily compared. Our large-scale empirical
study links together numerous earlier works on different datasets and methods;
thus offering a comprehensive overview of the existing property classes,
datasets, and their interactions with different methods. We emphasise the
importance of uncertainty quantification and the time and therefore cost of
applying these methods in the drug development decision-making cycle. We
discover that the best method depends on the dataset, and that engineered
features with classical ML methods often outperform deep learning.
Specifically, QSAR datasets are typically best analysed with classical methods
such as Gaussian Processes while ADMET datasets are sometimes better described
by Trees or Deep Learning methods such as Graph Neural Networks or language
models. Our work highlights that practitioners do not yet have a
straightforward, black-box procedure to rely on, and sets the precedent for
creating practitioner-relevant benchmarks. Deep learning approaches must be
proven on these benchmarks to become the practical method of choice in drug
property prediction.
Related papers
- A large dataset curation and benchmark for drug target interaction [0.7699646945563469]
Bioactivity data plays a key role in drug discovery and repurposing.
We propose a way to standardize and represent efficiently a very large dataset curated from multiple public sources.
arXiv Detail & Related papers (2024-01-30T17:06:25Z) - Machine Learning Small Molecule Properties in Drug Discovery [44.62264781248437]
We review a wide range of properties, including binding affinities, solubility, and ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity)
We discuss existing popular descriptors and embeddings, such as chemical fingerprints and graph-based neural networks.
Finally, techniques to provide an understanding of model predictions, especially for critical decision-making in drug discovery are assessed.
arXiv Detail & Related papers (2023-08-02T22:18:41Z) - A Critical Re-evaluation of Benchmark Datasets for (Deep) Learning-Based
Matching Algorithms [11.264467955516706]
We propose four approaches to assessing the difficulty and appropriateness of 13 established datasets.
We show that most of the popular datasets pose rather easy classification tasks.
We propose a new methodology for yielding benchmark datasets.
arXiv Detail & Related papers (2023-07-03T07:54:54Z) - DCID: Deep Canonical Information Decomposition [84.59396326810085]
We consider the problem of identifying the signal shared between two one-dimensional target variables.
We propose ICM, an evaluation metric which can be used in the presence of ground-truth labels.
We also propose Deep Canonical Information Decomposition (DCID) - a simple, yet effective approach for learning the shared variables.
arXiv Detail & Related papers (2023-06-27T16:59:06Z) - Time Associated Meta Learning for Clinical Prediction [78.99422473394029]
We propose a novel time associated meta learning (TAML) method to make effective predictions at multiple future time points.
To address the sparsity problem after task splitting, TAML employs a temporal information sharing strategy to augment the number of positive samples.
We demonstrate the effectiveness of TAML on multiple clinical datasets, where it consistently outperforms a range of strong baselines.
arXiv Detail & Related papers (2023-03-05T03:54:54Z) - Drug Synergistic Combinations Predictions via Large-Scale Pre-Training
and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation.
Deep learning models have emerged as an efficient way to discover synergistic combinations.
Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Towards General and Efficient Active Learning [20.888364610175987]
Active learning aims to select the most informative samples to exploit limited annotation budgets.
We propose a novel general and efficient active learning (GEAL) method in this paper.
Our method can conduct data selection processes on different datasets with a single-pass inference of the same model.
arXiv Detail & Related papers (2021-12-15T08:35:28Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - Siloed Federated Learning for Multi-Centric Histopathology Datasets [0.17842332554022694]
This paper proposes a novel federated learning approach for deep learning architectures in the medical domain.
Local-statistic batch normalization (BN) layers are introduced, resulting in collaboratively-trained, yet center-specific models.
We benchmark the proposed method on the classification of tumorous histopathology image patches extracted from the Camelyon16 and Camelyon17 datasets.
arXiv Detail & Related papers (2020-08-17T15:49:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.