Related papers: Accelerating Black-Box Molecular Property Optimization by Adaptively Learning Sparse Subspaces

Accelerating Black-Box Molecular Property Optimization by Adaptively Learning Sparse Subspaces

URL: http://arxiv.org/abs/2401.01398v1
Date: Tue, 2 Jan 2024 18:34:29 GMT
Title: Accelerating Black-Box Molecular Property Optimization by Adaptively Learning Sparse Subspaces
Authors: Farshud Sorourifar, Thomas Banker, Joel A. Paulson
Abstract summary: We show that our proposed method substantially outperforms existing MPO methods on a variety of benchmark and real-world problems. Specifically, we show that our method can routinely find near-optimal molecules out of a set of more than $>100$k alternatives within 100 or fewer expensive queries.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Molecular property optimization (MPO) problems are inherently challenging since they are formulated over discrete, unstructured spaces and the labeling process involves expensive simulations or experiments, which fundamentally limits the amount of available data. Bayesian optimization (BO) is a powerful and popular framework for efficient optimization of noisy, black-box objective functions (e.g., measured property values), thus is a potentially attractive framework for MPO. To apply BO to MPO problems, one must select a structured molecular representation that enables construction of a probabilistic surrogate model. Many molecular representations have been developed, however, they are all high-dimensional, which introduces important challenges in the BO process -- mainly because the curse of dimensionality makes it difficult to define and perform inference over a suitable class of surrogate models. This challenge has been recently addressed by learning a lower-dimensional encoding of a SMILE or graph representation of a molecule in an unsupervised manner and then performing BO in the encoded space. In this work, we show that such methods have a tendency to "get stuck," which we hypothesize occurs since the mapping from the encoded space to property values is not necessarily well-modeled by a Gaussian process. We argue for an alternative approach that combines numerical molecular descriptors with a sparse axis-aligned Gaussian process model, which is capable of rapidly identifying sparse subspaces that are most relevant to modeling the unknown property function. We demonstrate that our proposed method substantially outperforms existing MPO methods on a variety of benchmark and real-world problems. Specifically, we show that our method can routinely find near-optimal molecules out of a set of more than $>100$k alternatives within 100 or fewer expensive queries.

Related papers

Text-Guided Multi-Property Molecular Optimization with a Diffusion Language Model [77.50732023411811]
We propose a text-guided multi-property molecular optimization method utilizing transformer-based diffusion language model (TransDLM) TransDLM leverages standardized chemical nomenclature as semantic representations of molecules and implicitly embeds property requirements into textual descriptions. Our approach surpasses state-of-the-art methods in optimizing molecular structural similarity and enhancing chemical properties on the benchmark dataset.
arXiv Detail & Related papers (2024-10-17T14:30:27Z)
MING: A Functional Approach to Learning Molecular Generative Models [46.189683355768736]
This paper introduces a novel paradigm for learning molecule generative models based on functional representations. We propose Molecular Implicit Neural Generation (MING), a diffusion-based model that learns molecular distributions in function space.
arXiv Detail & Related papers (2024-10-16T13:02:02Z)
Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding [84.3224556294803]
Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences. We aim to optimize downstream reward functions while preserving the naturalness of these design spaces. Our algorithm integrates soft value functions, which looks ahead to how intermediate noisy states lead to high rewards in the future.
arXiv Detail & Related papers (2024-08-15T16:47:59Z)
Sensitivity analysis using the Metamodel of Optimal Prognosis [0.0]
In real case applications within the virtual prototyping process, it is not always possible to reduce the complexity of the physical models. We present an automatic approach for the selection of the optimal suitable meta-model for the actual problem.
arXiv Detail & Related papers (2024-08-07T07:09:06Z)
Score-based Diffusion Models in Function Space [140.792362459734]
Diffusion models have recently emerged as a powerful framework for generative modeling. We introduce a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space. We show that the corresponding discretized algorithm generates accurate samples at a fixed cost independent of the data resolution.
arXiv Detail & Related papers (2023-02-14T23:50:53Z)
Monte Carlo Neural PDE Solver for Learning PDEs via Probabilistic Representation [59.45669299295436]
We propose a Monte Carlo PDE solver for training unsupervised neural solvers. We use the PDEs' probabilistic representation, which regards macroscopic phenomena as ensembles of random particles. Our experiments on convection-diffusion, Allen-Cahn, and Navier-Stokes equations demonstrate significant improvements in accuracy and efficiency.
arXiv Detail & Related papers (2023-02-10T08:05:19Z)
An Empirical Evaluation of Zeroth-Order Optimization Methods on AI-driven Molecule Optimization [78.36413169647408]
We study the effectiveness of various ZO optimization methods for optimizing molecular objectives. We show the advantages of ZO sign-based gradient descent (ZO-signGD) We demonstrate the potential effectiveness of ZO optimization methods on widely used benchmark tasks from the Guacamol suite.
arXiv Detail & Related papers (2022-10-27T01:58:10Z)
Combining Latent Space and Structured Kernels for Bayesian Optimization over Combinatorial Spaces [27.989924313988016]
We consider the problem of optimizing spaces (e.g., sequences, trees, and graphs) using expensive black-box function evaluations. A recent BO approach for spaces is through a reduction to BO over continuous spaces by learning a latent representation of structures. This paper proposes a principled approach referred as LADDER to overcome this drawback.
arXiv Detail & Related papers (2021-11-01T18:26:22Z)
A data-driven peridynamic continuum model for upscaling molecular dynamics [3.1196544696082613]
We propose a learning framework to extract, from molecular dynamics data, an optimal Linear Peridynamic Solid model. We provide sufficient well-posedness conditions for discretized LPS models with sign-changing influence functions. This framework guarantees that the resulting model is mathematically well-posed, physically consistent, and that it generalizes well to settings that are different from the ones used during training.
arXiv Detail & Related papers (2021-08-04T07:07:47Z)
Designing Air Flow with Surrogate-assisted Phenotypic Niching [117.44028458220427]
We introduce surrogate-assisted phenotypic niching, a quality diversity algorithm. It allows to discover a large, diverse set of behaviors by using computationally expensive phenotypic features. In this work we discover the types of air flow in a 2D fluid dynamics optimization problem.
arXiv Detail & Related papers (2021-05-10T10:45:28Z)
High-Dimensional Bayesian Optimization with Sparse Axis-Aligned Subspaces [14.03847432040056]
We argue that a surrogate model defined on sparse axis-aligned subspaces offer an attractive compromise between flexibility and parsimony. We demonstrate that our approach, which relies on Hamiltonian Monte Carlo for inference, can rapidly identify sparse subspaces relevant to modeling the unknown objective function.
arXiv Detail & Related papers (2021-02-27T23:06:24Z)
Goal-directed Generation of Discrete Structures with Conditional Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward. We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.