Beam Enumeration: Probabilistic Explainability For Sample Efficient
Self-conditioned Molecular Design
- URL: http://arxiv.org/abs/2309.13957v2
- Date: Sun, 3 Mar 2024 16:23:00 GMT
- Title: Beam Enumeration: Probabilistic Explainability For Sample Efficient
Self-conditioned Molecular Design
- Authors: Jeff Guo, Philippe Schwaller
- Abstract summary: Generative molecular design has moved from proof-of-concept to real-world applicability.
Key challenges in explainability and sample efficiency present opportunities to enhance generative design.
Beamion is generally applicable to any language-based molecular generative model.
- Score: 0.4769602527256662
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative molecular design has moved from proof-of-concept to real-world
applicability, as marked by the surge in very recent papers reporting
experimental validation. Key challenges in explainability and sample efficiency
present opportunities to enhance generative design to directly optimize
expensive high-fidelity oracles and provide actionable insights to domain
experts. Here, we propose Beam Enumeration to exhaustively enumerate the most
probable sub-sequences from language-based molecular generative models and show
that molecular substructures can be extracted. When coupled with reinforcement
learning, extracted substructures become meaningful, providing a source of
explainability and improving sample efficiency through self-conditioned
generation. Beam Enumeration is generally applicable to any language-based
molecular generative model and notably further improves the performance of the
recently reported Augmented Memory algorithm, which achieved the new
state-of-the-art on the Practical Molecular Optimization benchmark for sample
efficiency. The combined algorithm generates more high reward molecules and
faster, given a fixed oracle budget. Beam Enumeration shows that improvements
to explainability and sample efficiency for molecular design can be made
synergistic.
Related papers
- Aligning Target-Aware Molecule Diffusion Models with Exact Energy Optimization [147.7899503829411]
We propose a novel and general alignment framework to align pretrained target diffusion models with preferred functional properties, named AliDiff.
AliDiff shifts the target-conditioned chemical distribution towards regions with higher binding affinity and structural rationality, specified by user-defined reward functions.
We show that AliDiff can generate molecules with state-of-the-art binding energies with up to -7.07 Avg. Vina Score, while maintaining strong molecular properties.
arXiv Detail & Related papers (2024-07-01T06:10:29Z) - Diffusion Model for Data-Driven Black-Box Optimization [54.25693582870226]
We focus on diffusion models, a powerful generative AI technology, and investigate their potential for black-box optimization.
We study two practical types of labels: 1) noisy measurements of a real-valued reward function and 2) human preference based on pairwise comparisons.
Our proposed method reformulates the design optimization problem into a conditional sampling problem, which allows us to leverage the power of diffusion models.
arXiv Detail & Related papers (2024-03-20T00:41:12Z) - DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization [49.85944390503957]
DecompOpt is a structure-based molecular optimization method based on a controllable and diffusion model.
We show that DecompOpt can efficiently generate molecules with improved properties than strong de novo baselines.
arXiv Detail & Related papers (2024-03-07T02:53:40Z) - Variational Autoencoding Molecular Graphs with Denoising Diffusion
Probabilistic Model [0.0]
We propose a novel deep generative model that incorporates a hierarchical structure into the probabilistic latent vectors.
We demonstrate that our model can design effective molecular latent vectors for molecular property prediction from some experiments by small datasets on physical properties and activity.
arXiv Detail & Related papers (2023-07-02T17:29:41Z) - Protein Design with Guided Discrete Diffusion [67.06148688398677]
A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling.
We propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models.
NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods.
arXiv Detail & Related papers (2023-05-31T16:31:24Z) - Augmented Memory: Capitalizing on Experience Replay to Accelerate De
Novo Molecular Design [0.0]
Molecular generative models should learn to satisfy a desired objective under minimal oracle evaluations.
We propose a novel algorithm called Augmented Memory that combines data augmentation with experience replay.
We show that scores obtained from oracle calls can be reused to update the model multiple times.
arXiv Detail & Related papers (2023-05-10T14:00:50Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - Multi-Objective Latent Space Optimization of Generative Molecular Design Models [3.1996400013865656]
We propose a multi-objective latent space optimization (LSO) method that can significantly enhance the performance of generative molecular design (GMD)
We demonstrate that our multi-objective GMD LSO method can significantly improve the performance of GMD for jointly optimizing multiple molecular properties.
arXiv Detail & Related papers (2022-03-01T15:12:05Z) - CELLS: Cost-Effective Evolution in Latent Space for Goal-Directed
Molecular Generation [23.618366377098614]
We propose a cost-effective evolution strategy in latent space, which optimize the molecular latent representation vectors.
We adopt a pre-trained molecular generative model to map the latent and observation spaces.
We conduct extensive experiments on multiple optimization tasks comparing the proposed framework to several advanced techniques.
arXiv Detail & Related papers (2021-11-30T11:02:18Z) - Learning Neural Generative Dynamics for Molecular Conformation
Generation [89.03173504444415]
We study how to generate molecule conformations (textiti.e., 3D structures) from a molecular graph.
We propose a novel probabilistic framework to generate valid and diverse conformations given a molecular graph.
arXiv Detail & Related papers (2021-02-20T03:17:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.