Related papers: Beam Enumeration: Probabilistic Explainability For Sample Efficient Self-conditioned Molecular Design

Beam Enumeration: Probabilistic Explainability For Sample Efficient Self-conditioned Molecular Design

URL: http://arxiv.org/abs/2309.13957v2
Date: Sun, 3 Mar 2024 16:23:00 GMT
Title: Beam Enumeration: Probabilistic Explainability For Sample Efficient Self-conditioned Molecular Design
Authors: Jeff Guo, Philippe Schwaller
Abstract summary: Generative molecular design has moved from proof-of-concept to real-world applicability. Key challenges in explainability and sample efficiency present opportunities to enhance generative design. Beamion is generally applicable to any language-based molecular generative model.
Score: 0.4769602527256662
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generative molecular design has moved from proof-of-concept to real-world applicability, as marked by the surge in very recent papers reporting experimental validation. Key challenges in explainability and sample efficiency present opportunities to enhance generative design to directly optimize expensive high-fidelity oracles and provide actionable insights to domain experts. Here, we propose Beam Enumeration to exhaustively enumerate the most probable sub-sequences from language-based molecular generative models and show that molecular substructures can be extracted. When coupled with reinforcement learning, extracted substructures become meaningful, providing a source of explainability and improving sample efficiency through self-conditioned generation. Beam Enumeration is generally applicable to any language-based molecular generative model and notably further improves the performance of the recently reported Augmented Memory algorithm, which achieved the new state-of-the-art on the Practical Molecular Optimization benchmark for sample efficiency. The combined algorithm generates more high reward molecules and faster, given a fixed oracle budget. Beam Enumeration shows that improvements to explainability and sample efficiency for molecular design can be made synergistic.

Related papers

UniGenX: Unified Generation of Sequence and Structure with Autoregressive Diffusion [61.690978792873196]
Existing approaches rely on either autoregressive sequence models or diffusion models. We propose UniGenX, a unified framework that combines autoregressive next-token prediction with conditional diffusion models. We validate the effectiveness of UniGenX on material and small molecule generation tasks.
arXiv Detail & Related papers (2025-03-09T16:43:07Z)
Pathway-Guided Optimization of Deep Generative Molecular Design Models for Cancer Therapy [1.8210200978176423]
The junction tree variational autoencoder (JTVAE) has been shown to be an efficient generative model. We show how a pharmacodynamic model, assessing the therapeutic efficacy of a drug-like small molecule, can be incorporated for effective latent space optimization.
arXiv Detail & Related papers (2024-11-05T19:20:30Z)
Cliqueformer: Model-Based Optimization with Structured Transformers [102.55764949282906]
Large neural networks excel at prediction tasks, but their application to design problems, such as protein engineering or materials discovery, requires solving offline model-based optimization (MBO) problems. We present Cliqueformer, a transformer-based architecture that learns the black-box function's structure through functional graphical models (FGM) Across various domains, including chemical and genetic design tasks, Cliqueformer demonstrates superior performance compared to existing methods.
arXiv Detail & Related papers (2024-10-17T00:35:47Z)
MING: A Functional Approach to Learning Molecular Generative Models [46.189683355768736]
This paper introduces a novel paradigm for learning molecule generative models based on functional representations. We propose Molecular Implicit Neural Generation (MING), a diffusion-based model that learns molecular distributions in function space.
arXiv Detail & Related papers (2024-10-16T13:02:02Z)
Chemistry-Inspired Diffusion with Non-Differentiable Guidance [10.573577157257564]
Recent advances in diffusion models have shown remarkable potential in the conditional generation of novel molecules. We propose a novel approach that leverage domain knowledge from quantum chemistry as a non-differentiable oracle to guide an unconditional diffusion model. Instead of relying on neural networks, the oracle provides accurate guidance in the form of estimated gradients, allowing the diffusion process to sample from a conditional distribution specified by quantum chemistry.
arXiv Detail & Related papers (2024-10-09T03:10:21Z)
Aligning Target-Aware Molecule Diffusion Models with Exact Energy Optimization [147.7899503829411]
AliDiff is a novel framework to align pretrained target diffusion models with preferred functional properties. It can generate molecules with state-of-the-art binding energies with up to -7.07 Avg. Vina Score.
arXiv Detail & Related papers (2024-07-01T06:10:29Z)
DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization [49.85944390503957]
DecompOpt is a structure-based molecular optimization method based on a controllable and diffusion model. We show that DecompOpt can efficiently generate molecules with improved properties than strong de novo baselines.
arXiv Detail & Related papers (2024-03-07T02:53:40Z)
Unveiling Molecular Moieties through Hierarchical Grad-CAM Graph Explainability [0.0]
The integration of explainable methods to elucidate the specific contributions of molecular substructures to biological activity remains a significant challenge. We trained 20 GNN models on a dataset of small molecules with the goal of predicting their activity on 20 distinct protein targets from the Kinase family. We implemented the Hierarchical Grad-CAM graph Explainer framework, enabling an in-depth analysis of the molecular moieties driving protein-ligand binding stabilization.
arXiv Detail & Related papers (2024-01-29T17:23:25Z)
Protein Design with Guided Discrete Diffusion [67.06148688398677]
A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling. We propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models. NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods.
arXiv Detail & Related papers (2023-05-31T16:31:24Z)
Augmented Memory: Capitalizing on Experience Replay to Accelerate De Novo Molecular Design [0.0]
Molecular generative models should learn to satisfy a desired objective under minimal oracle evaluations. We propose a novel algorithm called Augmented Memory that combines data augmentation with experience replay. We show that scores obtained from oracle calls can be reused to update the model multiple times.
arXiv Detail & Related papers (2023-05-10T14:00:50Z)
Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation. We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria. Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z)
Learning Neural Generative Dynamics for Molecular Conformation Generation [89.03173504444415]
We study how to generate molecule conformations (textiti.e., 3D structures) from a molecular graph. We propose a novel probabilistic framework to generate valid and diverse conformations given a molecular graph.
arXiv Detail & Related papers (2021-02-20T03:17:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.