Accelerating Black-Box Molecular Property Optimization by Adaptively
Learning Sparse Subspaces
- URL: http://arxiv.org/abs/2401.01398v1
- Date: Tue, 2 Jan 2024 18:34:29 GMT
- Title: Accelerating Black-Box Molecular Property Optimization by Adaptively
Learning Sparse Subspaces
- Authors: Farshud Sorourifar, Thomas Banker, Joel A. Paulson
- Abstract summary: We show that our proposed method substantially outperforms existing MPO methods on a variety of benchmark and real-world problems.
Specifically, we show that our method can routinely find near-optimal molecules out of a set of more than $>100$k alternatives within 100 or fewer expensive queries.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Molecular property optimization (MPO) problems are inherently challenging
since they are formulated over discrete, unstructured spaces and the labeling
process involves expensive simulations or experiments, which fundamentally
limits the amount of available data. Bayesian optimization (BO) is a powerful
and popular framework for efficient optimization of noisy, black-box objective
functions (e.g., measured property values), thus is a potentially attractive
framework for MPO. To apply BO to MPO problems, one must select a structured
molecular representation that enables construction of a probabilistic surrogate
model. Many molecular representations have been developed, however, they are
all high-dimensional, which introduces important challenges in the BO process
-- mainly because the curse of dimensionality makes it difficult to define and
perform inference over a suitable class of surrogate models. This challenge has
been recently addressed by learning a lower-dimensional encoding of a SMILE or
graph representation of a molecule in an unsupervised manner and then
performing BO in the encoded space. In this work, we show that such methods
have a tendency to "get stuck," which we hypothesize occurs since the mapping
from the encoded space to property values is not necessarily well-modeled by a
Gaussian process. We argue for an alternative approach that combines numerical
molecular descriptors with a sparse axis-aligned Gaussian process model, which
is capable of rapidly identifying sparse subspaces that are most relevant to
modeling the unknown property function. We demonstrate that our proposed method
substantially outperforms existing MPO methods on a variety of benchmark and
real-world problems. Specifically, we show that our method can routinely find
near-optimal molecules out of a set of more than $>100$k alternatives within
100 or fewer expensive queries.
Related papers
- Text-Guided Multi-Property Molecular Optimization with a Diffusion Language Model [77.50732023411811]
We propose a text-guided multi-property molecular optimization method utilizing transformer-based diffusion language model (TransDLM)
TransDLM leverages standardized chemical nomenclature as semantic representations of molecules and implicitly embeds property requirements into textual descriptions.
Our approach surpasses state-of-the-art methods in optimizing molecular structural similarity and enhancing chemical properties on the benchmark dataset.
arXiv Detail & Related papers (2024-10-17T14:30:27Z) - MING: A Functional Approach to Learning Molecular Generative Models [46.189683355768736]
This paper introduces a novel paradigm for learning molecule generative models based on functional representations.
We propose Molecular Implicit Neural Generation (MING), a diffusion-based model that learns molecular distributions in function space.
arXiv Detail & Related papers (2024-10-16T13:02:02Z) - Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding [84.3224556294803]
Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences.
We aim to optimize downstream reward functions while preserving the naturalness of these design spaces.
Our algorithm integrates soft value functions, which looks ahead to how intermediate noisy states lead to high rewards in the future.
arXiv Detail & Related papers (2024-08-15T16:47:59Z) - Sensitivity analysis using the Metamodel of Optimal Prognosis [0.0]
In real case applications within the virtual prototyping process, it is not always possible to reduce the complexity of the physical models.
We present an automatic approach for the selection of the optimal suitable meta-model for the actual problem.
arXiv Detail & Related papers (2024-08-07T07:09:06Z) - Score-based Diffusion Models in Function Space [140.792362459734]
Diffusion models have recently emerged as a powerful framework for generative modeling.
We introduce a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space.
We show that the corresponding discretized algorithm generates accurate samples at a fixed cost independent of the data resolution.
arXiv Detail & Related papers (2023-02-14T23:50:53Z) - Monte Carlo Neural PDE Solver for Learning PDEs via Probabilistic Representation [59.45669299295436]
We propose a Monte Carlo PDE solver for training unsupervised neural solvers.
We use the PDEs' probabilistic representation, which regards macroscopic phenomena as ensembles of random particles.
Our experiments on convection-diffusion, Allen-Cahn, and Navier-Stokes equations demonstrate significant improvements in accuracy and efficiency.
arXiv Detail & Related papers (2023-02-10T08:05:19Z) - Combining Latent Space and Structured Kernels for Bayesian Optimization
over Combinatorial Spaces [27.989924313988016]
We consider the problem of optimizing spaces (e.g., sequences, trees, and graphs) using expensive black-box function evaluations.
A recent BO approach for spaces is through a reduction to BO over continuous spaces by learning a latent representation of structures.
This paper proposes a principled approach referred as LADDER to overcome this drawback.
arXiv Detail & Related papers (2021-11-01T18:26:22Z) - A data-driven peridynamic continuum model for upscaling molecular
dynamics [3.1196544696082613]
We propose a learning framework to extract, from molecular dynamics data, an optimal Linear Peridynamic Solid model.
We provide sufficient well-posedness conditions for discretized LPS models with sign-changing influence functions.
This framework guarantees that the resulting model is mathematically well-posed, physically consistent, and that it generalizes well to settings that are different from the ones used during training.
arXiv Detail & Related papers (2021-08-04T07:07:47Z) - Designing Air Flow with Surrogate-assisted Phenotypic Niching [117.44028458220427]
We introduce surrogate-assisted phenotypic niching, a quality diversity algorithm.
It allows to discover a large, diverse set of behaviors by using computationally expensive phenotypic features.
In this work we discover the types of air flow in a 2D fluid dynamics optimization problem.
arXiv Detail & Related papers (2021-05-10T10:45:28Z) - High-Dimensional Bayesian Optimization with Sparse Axis-Aligned
Subspaces [14.03847432040056]
We argue that a surrogate model defined on sparse axis-aligned subspaces offer an attractive compromise between flexibility and parsimony.
We demonstrate that our approach, which relies on Hamiltonian Monte Carlo for inference, can rapidly identify sparse subspaces relevant to modeling the unknown objective function.
arXiv Detail & Related papers (2021-02-27T23:06:24Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.