Accelerating Black-Box Molecular Property Optimization by Adaptively
Learning Sparse Subspaces
- URL: http://arxiv.org/abs/2401.01398v1
- Date: Tue, 2 Jan 2024 18:34:29 GMT
- Title: Accelerating Black-Box Molecular Property Optimization by Adaptively
Learning Sparse Subspaces
- Authors: Farshud Sorourifar, Thomas Banker, Joel A. Paulson
- Abstract summary: We show that our proposed method substantially outperforms existing MPO methods on a variety of benchmark and real-world problems.
Specifically, we show that our method can routinely find near-optimal molecules out of a set of more than $>100$k alternatives within 100 or fewer expensive queries.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Molecular property optimization (MPO) problems are inherently challenging
since they are formulated over discrete, unstructured spaces and the labeling
process involves expensive simulations or experiments, which fundamentally
limits the amount of available data. Bayesian optimization (BO) is a powerful
and popular framework for efficient optimization of noisy, black-box objective
functions (e.g., measured property values), thus is a potentially attractive
framework for MPO. To apply BO to MPO problems, one must select a structured
molecular representation that enables construction of a probabilistic surrogate
model. Many molecular representations have been developed, however, they are
all high-dimensional, which introduces important challenges in the BO process
-- mainly because the curse of dimensionality makes it difficult to define and
perform inference over a suitable class of surrogate models. This challenge has
been recently addressed by learning a lower-dimensional encoding of a SMILE or
graph representation of a molecule in an unsupervised manner and then
performing BO in the encoded space. In this work, we show that such methods
have a tendency to "get stuck," which we hypothesize occurs since the mapping
from the encoded space to property values is not necessarily well-modeled by a
Gaussian process. We argue for an alternative approach that combines numerical
molecular descriptors with a sparse axis-aligned Gaussian process model, which
is capable of rapidly identifying sparse subspaces that are most relevant to
modeling the unknown property function. We demonstrate that our proposed method
substantially outperforms existing MPO methods on a variety of benchmark and
real-world problems. Specifically, we show that our method can routinely find
near-optimal molecules out of a set of more than $>100$k alternatives within
100 or fewer expensive queries.
Related papers
- Score-based Diffusion Models in Function Space [140.792362459734]
Diffusion models have recently emerged as a powerful framework for generative modeling.
We introduce a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space.
We show that the corresponding discretized algorithm generates accurate samples at a fixed cost independent of the data resolution.
arXiv Detail & Related papers (2023-02-14T23:50:53Z) - Monte Carlo Neural PDE Solver for Learning PDEs via Probabilistic Representation [59.45669299295436]
We propose a Monte Carlo PDE solver for training unsupervised neural solvers.
We use the PDEs' probabilistic representation, which regards macroscopic phenomena as ensembles of random particles.
Our experiments on convection-diffusion, Allen-Cahn, and Navier-Stokes equations demonstrate significant improvements in accuracy and efficiency.
arXiv Detail & Related papers (2023-02-10T08:05:19Z) - An Empirical Evaluation of Zeroth-Order Optimization Methods on
AI-driven Molecule Optimization [78.36413169647408]
We study the effectiveness of various ZO optimization methods for optimizing molecular objectives.
We show the advantages of ZO sign-based gradient descent (ZO-signGD)
We demonstrate the potential effectiveness of ZO optimization methods on widely used benchmark tasks from the Guacamol suite.
arXiv Detail & Related papers (2022-10-27T01:58:10Z) - Improving Small Molecule Generation using Mutual Information Machine [0.0]
MolMIM is a probabilistic auto-encoder for small molecule drug discovery.
We demonstrate MolMIM's superior generation as measured in terms of validity, uniqueness, and novelty.
We then utilize CMA-ES, a naive black-box and gradient free search algorithm, over MolMIM's latent space for the task of property guided molecule optimization.
arXiv Detail & Related papers (2022-08-18T18:32:48Z) - Combining Latent Space and Structured Kernels for Bayesian Optimization
over Combinatorial Spaces [27.989924313988016]
We consider the problem of optimizing spaces (e.g., sequences, trees, and graphs) using expensive black-box function evaluations.
A recent BO approach for spaces is through a reduction to BO over continuous spaces by learning a latent representation of structures.
This paper proposes a principled approach referred as LADDER to overcome this drawback.
arXiv Detail & Related papers (2021-11-01T18:26:22Z) - A data-driven peridynamic continuum model for upscaling molecular
dynamics [3.1196544696082613]
We propose a learning framework to extract, from molecular dynamics data, an optimal Linear Peridynamic Solid model.
We provide sufficient well-posedness conditions for discretized LPS models with sign-changing influence functions.
This framework guarantees that the resulting model is mathematically well-posed, physically consistent, and that it generalizes well to settings that are different from the ones used during training.
arXiv Detail & Related papers (2021-08-04T07:07:47Z) - Conservative Objective Models for Effective Offline Model-Based
Optimization [78.19085445065845]
Computational design problems arise in a number of settings, from synthetic biology to computer architectures.
We propose a method that learns a model of the objective function that lower bounds the actual value of the ground-truth objective on out-of-distribution inputs.
COMs are simple to implement and outperform a number of existing methods on a wide range of MBO problems.
arXiv Detail & Related papers (2021-07-14T17:55:28Z) - Designing Air Flow with Surrogate-assisted Phenotypic Niching [117.44028458220427]
We introduce surrogate-assisted phenotypic niching, a quality diversity algorithm.
It allows to discover a large, diverse set of behaviors by using computationally expensive phenotypic features.
In this work we discover the types of air flow in a 2D fluid dynamics optimization problem.
arXiv Detail & Related papers (2021-05-10T10:45:28Z) - High-Dimensional Bayesian Optimization with Sparse Axis-Aligned
Subspaces [14.03847432040056]
We argue that a surrogate model defined on sparse axis-aligned subspaces offer an attractive compromise between flexibility and parsimony.
We demonstrate that our approach, which relies on Hamiltonian Monte Carlo for inference, can rapidly identify sparse subspaces relevant to modeling the unknown objective function.
arXiv Detail & Related papers (2021-02-27T23:06:24Z) - Continuous surrogate-based optimization algorithms are well-suited for
expensive discrete problems [9.655888042539495]
We present empirical evidence showing that the use of continuous surrogate models displays competitive performance against state-of-the-art discrete surrogate-based methods.
Our experiments on different discrete structures and time constraints also give more insight into which algorithms work well on which type of problem.
arXiv Detail & Related papers (2020-11-06T15:27:45Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.