ALMERIA: Boosting pairwise molecular contrasts with scalable methods
- URL: http://arxiv.org/abs/2305.13254v1
- Date: Fri, 28 Apr 2023 16:27:06 GMT
- Title: ALMERIA: Boosting pairwise molecular contrasts with scalable methods
- Authors: Rafael Mena-Yedra, Juana L. Redondo, Horacio P\'erez-S\'anchez, Pilar
M. Ortigosa
- Abstract summary: ALMERIA is a tool for estimating compound similarities and activity prediction based on pairwise molecular contrasts.
It has been implemented using scalable software and methods to exploit large volumes of data.
Experiments show state-of-the-art performance for molecular activity prediction.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Searching for potential active compounds in large databases is a necessary
step to reduce time and costs in modern drug discovery pipelines. Such virtual
screening methods seek to provide predictions that allow the search space to be
narrowed down. Although cheminformatics has made great progress in exploiting
the potential of available big data, caution is needed to avoid introducing
bias and provide useful predictions with new compounds. In this work, we
propose the decision-support tool ALMERIA (Advanced Ligand Multiconformational
Exploration with Robust Interpretable Artificial Intelligence) for estimating
compound similarities and activity prediction based on pairwise molecular
contrasts while considering their conformation variability. The methodology
covers the entire pipeline from data preparation to model selection and
hyperparameter optimization. It has been implemented using scalable software
and methods to exploit large volumes of data -- in the order of several
terabytes -- , offering a very quick response even for a large batch of
queries. The implementation and experiments have been performed in a
distributed computer cluster using a benchmark, the public access DUD-E
database. In addition to cross-validation, detailed data split criteria have
been used to evaluate the models on different data partitions to assess their
true generalization ability with new compounds. Experiments show
state-of-the-art performance for molecular activity prediction (ROC AUC:
$0.99$, $0.96$, $0.87$), proving that the chosen data representation and
modeling have good properties to generalize. Molecular conformations --
prediction performance and sensitivity analysis -- have also been evaluated.
Finally, an interpretability analysis has been performed using the SHAP method.
Related papers
- Transfer Learning for Molecular Property Predictions from Small Data Sets [0.0]
We benchmark common machine learning models for the prediction of molecular properties on two small data sets.
We present a transfer learning strategy that uses large data sets to pre-train the respective models and allows to obtain more accurate models after fine-tuning on the original data sets.
arXiv Detail & Related papers (2024-04-20T14:25:34Z) - Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation.
In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model.
We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Objective-Agnostic Enhancement of Molecule Properties via Multi-Stage
VAE [1.3597551064547502]
Variational autoencoder (VAE) is a popular method for drug discovery and various architectures and pipelines have been proposed to improve its performance.
VAE approaches are known to suffer from poor manifold recovery when the data lie on a low-dimensional manifold embedded in a higher dimensional ambient space.
In this paper, we explore applying a multi-stage VAE approach, that can improve manifold recovery on a synthetic dataset, to the field of drug discovery.
arXiv Detail & Related papers (2023-08-24T20:22:22Z) - Machine Learning Small Molecule Properties in Drug Discovery [44.62264781248437]
We review a wide range of properties, including binding affinities, solubility, and ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity)
We discuss existing popular descriptors and embeddings, such as chemical fingerprints and graph-based neural networks.
Finally, techniques to provide an understanding of model predictions, especially for critical decision-making in drug discovery are assessed.
arXiv Detail & Related papers (2023-08-02T22:18:41Z) - A Federated Learning-based Industrial Health Prognostics for
Heterogeneous Edge Devices using Matched Feature Extraction [16.337207503536384]
We propose a pioneering FL-based health prognostic model with a feature similarity-matched parameter aggregation algorithm.
We show that the proposed method yields accuracy improvements as high as 44.5% and 39.3% for state-of-health estimation and remaining useful life estimation.
arXiv Detail & Related papers (2023-05-13T07:20:31Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - Integrating Transformer and Autoencoder Techniques with Spectral Graph
Algorithms for the Prediction of Scarcely Labeled Molecular Data [2.8360662552057323]
This work introduces three graph-based models incorporating Merriman-Bence-Osher (MBO) techniques to tackle this challenge.
Specifically, graph-based modifications of the MBO scheme is integrated with state-of-the-art techniques, including a home-made transformer and an autoencoder.
The proposed models are validated using five benchmark data sets.
arXiv Detail & Related papers (2022-11-12T22:45:32Z) - Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation.
GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - DecAug: Augmenting HOI Detection via Decomposition [54.65572599920679]
Current algorithms suffer from insufficient training samples and category imbalance within datasets.
We propose an efficient and effective data augmentation method called DecAug for HOI detection.
Experiments show that our method brings up to 3.3 mAP and 1.6 mAP improvements on V-COCO and HICODET dataset.
arXiv Detail & Related papers (2020-10-02T13:59:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.