Gaussian Process Molecule Property Prediction with FlowMO
- URL: http://arxiv.org/abs/2010.01118v2
- Date: Wed, 14 Oct 2020 08:23:55 GMT
- Title: Gaussian Process Molecule Property Prediction with FlowMO
- Authors: Henry B. Moss, Ryan-Rhys Griffiths
- Abstract summary: FlowMO is an open-source library for molecular property prediction with Gaussian Processes.
It enables the user to make predictions with well-calibrated uncertainty estimates, an output central to active learning and molecular design applications.
- Score: 7.72630981555675
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present FlowMO: an open-source Python library for molecular property
prediction with Gaussian Processes. Built upon GPflow and RDKit, FlowMO enables
the user to make predictions with well-calibrated uncertainty estimates, an
output central to active learning and molecular design applications. Gaussian
Processes are particularly attractive for modelling small molecular datasets, a
characteristic of many real-world virtual screening campaigns where
high-quality experimental data is scarce. Computational experiments across
three small datasets demonstrate comparable predictive performance to deep
learning methods but with superior uncertainty calibration.
Related papers
- Computation-Aware Gaussian Processes: Model Selection And Linear-Time Inference [55.150117654242706]
We show that model selection for computation-aware GPs trained on 1.8 million data points can be done within a few hours on a single GPU.
As a result of this work, Gaussian processes can be trained on large-scale datasets without significantly compromising their ability to quantify uncertainty.
arXiv Detail & Related papers (2024-11-01T21:11:48Z) - On the Interplay of Subset Selection and Informed Graph Neural Networks [3.091456764812509]
This work focuses on predicting the molecules atomization energy in the QM9 dataset.
We show how maximizing molecular diversity in the training set selection process increases the robustness of linear and nonlinear regression techniques.
We also check the reliability of the predictions made by the graph neural network with a model-agnostic explainer.
arXiv Detail & Related papers (2023-06-15T09:09:27Z) - ALMERIA: Boosting pairwise molecular contrasts with scalable methods [0.0]
ALMERIA is a tool for estimating compound similarities and activity prediction based on pairwise molecular contrasts.
It has been implemented using scalable software and methods to exploit large volumes of data.
Experiments show state-of-the-art performance for molecular activity prediction.
arXiv Detail & Related papers (2023-04-28T16:27:06Z) - Mixtures of Gaussian process experts based on kernel stick-breaking
processes [0.6396288020763143]
We propose a new mixture model of Gaussian process experts based on kernel stick-breaking processes.
Our model maintains the intuitive appeal yet improve the performance of the existing models.
The model behaviour and improved predictive performance are demonstrated in experiments using six datasets.
arXiv Detail & Related papers (2023-04-26T21:23:01Z) - Score-based Diffusion Models in Function Space [137.70916238028306]
Diffusion models have recently emerged as a powerful framework for generative modeling.
This work introduces a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space.
We show that the corresponding discretized algorithm generates accurate samples at a fixed cost independent of the data resolution.
arXiv Detail & Related papers (2023-02-14T23:50:53Z) - Conditional Neural Processes for Molecules [0.0]
Neural processes (NPs) are models for transfer learning with properties reminiscent of Gaussian Processes (GPs)
This paper applies the conditional neural process (CNP) to DOCKSTRING, a dataset of docking scores for benchmarking ML models.
CNPs show competitive performance in few-shot learning tasks relative to supervised learning baselines common in QSAR modelling, as well as an alternative model for transfer learning based on pre-training and refining neural network regressors.
arXiv Detail & Related papers (2022-10-17T16:10:12Z) - Building Robust Machine Learning Models for Small Chemical Science Data:
The Case of Shear Viscosity [3.4761212729163313]
We train several Machine Learning models to predict the shear viscosity of a Lennard-Jones (LJ) fluid.
Specifically, the issues related to model selection, performance estimation and uncertainty quantification were investigated.
arXiv Detail & Related papers (2022-08-23T07:33:14Z) - Tyger: Task-Type-Generic Active Learning for Molecular Property
Prediction [121.97742787439546]
How to accurately predict the properties of molecules is an essential problem in AI-driven drug discovery.
To reduce annotation cost, deep Active Learning methods are developed to select only the most representative and informative data for annotating.
We propose a Task-type-generic active learning framework (termed Tyger) that is able to handle different types of learning tasks in a unified manner.
arXiv Detail & Related papers (2022-05-23T12:56:12Z) - Knowledge transfer across cell lines using Hybrid Gaussian Process
models with entity embedding vectors [62.997667081978825]
A large number of experiments are performed to develop a biochemical process.
Could we exploit data of already developed processes to make predictions for a novel process, we could significantly reduce the number of experiments needed.
arXiv Detail & Related papers (2020-11-27T17:38:15Z) - Real-Time Regression with Dividing Local Gaussian Processes [62.01822866877782]
Local Gaussian processes are a novel, computationally efficient modeling approach based on Gaussian process regression.
Due to an iterative, data-driven division of the input space, they achieve a sublinear computational complexity in the total number of training points in practice.
A numerical evaluation on real-world data sets shows their advantages over other state-of-the-art methods in terms of accuracy as well as prediction and update speed.
arXiv Detail & Related papers (2020-06-16T18:43:31Z) - Semi-Supervised Learning with Normalizing Flows [54.376602201489995]
FlowGMM is an end-to-end approach to generative semi supervised learning with normalizing flows.
We show promising results on a wide range of applications, including AG-News and Yahoo Answers text data.
arXiv Detail & Related papers (2019-12-30T17:36:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.