Generative Multi-Objective Bayesian Optimization with Scalable Batch Evaluations for Sample-Efficient De Novo Molecular Design
- URL: http://arxiv.org/abs/2512.17659v1
- Date: Fri, 19 Dec 2025 14:59:27 GMT
- Title: Generative Multi-Objective Bayesian Optimization with Scalable Batch Evaluations for Sample-Efficient De Novo Molecular Design
- Authors: Madhav R. Muthyala, Farshud Sorourifar, Tianhong Tan, You Peng, Joel A. Paulson,
- Abstract summary: This work introduces an alternative, modular "generate-then-optimize" framework for de novo molecular design/discovery.<n>We benchmark the framework against state-of-the-art latent-space and discrete molecular optimization methods.<n>Specifically, in a case study related to sustainable energy storage, we show that our approach quickly uncovers novel, diverse, and high-performing organic (quinone-based) cathode materials.
- Score: 1.8517039579627974
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Designing molecules that must satisfy multiple, often conflicting objectives is a central challenge in molecular discovery. The enormous size of chemical space and the cost of high-fidelity simulations have driven the development of machine learning-guided strategies for accelerating design with limited data. Among these, Bayesian optimization (BO) offers a principled framework for sample-efficient search, while generative models provide a mechanism to propose novel, diverse candidates beyond fixed libraries. However, existing methods that couple the two often rely on continuous latent spaces, which introduces both architectural entanglement and scalability challenges. This work introduces an alternative, modular "generate-then-optimize" framework for de novo multi-objective molecular design/discovery. At each iteration, a generative model is used to construct a large, diverse pool of candidate molecules, after which a novel acquisition function, qPMHI (multi-point Probability of Maximum Hypervolume Improvement), is used to optimally select a batch of candidates most likely to induce the largest Pareto front expansion. The key insight is that qPMHI decomposes additively, enabling exact, scalable batch selection via only simple ranking of probabilities that can be easily estimated with Monte Carlo sampling. We benchmark the framework against state-of-the-art latent-space and discrete molecular optimization methods, demonstrating significant improvements across synthetic benchmarks and application-driven tasks. Specifically, in a case study related to sustainable energy storage, we show that our approach quickly uncovers novel, diverse, and high-performing organic (quinone-based) cathode materials for aqueous redox flow battery applications.
Related papers
- Multi-Objective Coverage via Constraint Active Search [8.977841680385627]
We formulate the new multi-objective coverage (MOC) problem where our goal is to identify a small set of representative samples.<n>We propose a novel search algorithm, MOC-CAS, which employs an upper confidence bound-based acquisition function to select optimistic samples.<n>Compared to the competitive baselines, we show that our MOC-CAS empirically achieves superior performances across large-scale protein-target datasets for SARS-CoV-2 and cancer.
arXiv Detail & Related papers (2026-02-17T14:07:16Z) - Reward driven discovery of the optimal microstructure representations with invariant variational autoencoders [0.015295722752489374]
Variational Autoencoders (VAEs) provide a powerful means of constructing such low-dimensional representations.<n>VAEs are often optimized through trial-and-error and empirical analysis.<n>We investigated reward-based strategies for evaluating latent space representations.
arXiv Detail & Related papers (2025-09-30T20:15:42Z) - Bayesian Optimization for Molecules Should Be Pareto-Aware [6.877358955271652]
Multi-objective Bayesian optimization (MOBO) provides a principled framework for navigating trade-offs in molecular design.<n>We benchmark a simple MOBO strategy against a simple fixed-weight scalarized baseline using Expected Improvement (EI)<n>Our results show that even strong deterministic instantiations can underperform in low-data regimes.
arXiv Detail & Related papers (2025-07-18T07:12:19Z) - Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design [58.8094854658848]
We address the problem of fine-tuning diffusion models for reward-guided generation in biomolecular design.<n>We propose an iterative distillation-based fine-tuning framework that enables diffusion models to optimize for arbitrary reward functions.<n>Our off-policy formulation, combined with KL divergence minimization, enhances training stability and sample efficiency compared to existing RL-based methods.
arXiv Detail & Related papers (2025-07-01T05:55:28Z) - Aligning Target-Aware Molecule Diffusion Models with Exact Energy Optimization [147.7899503829411]
AliDiff is a novel framework to align pretrained target diffusion models with preferred functional properties.
It can generate molecules with state-of-the-art binding energies with up to -7.07 Avg. Vina Score.
arXiv Detail & Related papers (2024-07-01T06:10:29Z) - Diffusion Model for Data-Driven Black-Box Optimization [54.25693582870226]
We focus on diffusion models, a powerful generative AI technology, and investigate their potential for black-box optimization.
We study two practical types of labels: 1) noisy measurements of a real-valued reward function and 2) human preference based on pairwise comparisons.
Our proposed method reformulates the design optimization problem into a conditional sampling problem, which allows us to leverage the power of diffusion models.
arXiv Detail & Related papers (2024-03-20T00:41:12Z) - DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization [49.85944390503957]
DecompOpt is a structure-based molecular optimization method based on a controllable and diffusion model.
We show that DecompOpt can efficiently generate molecules with improved properties than strong de novo baselines.
arXiv Detail & Related papers (2024-03-07T02:53:40Z) - Accelerating Black-Box Molecular Property Optimization by Adaptively
Learning Sparse Subspaces [0.0]
We show that our proposed method substantially outperforms existing MPO methods on a variety of benchmark and real-world problems.
Specifically, we show that our method can routinely find near-optimal molecules out of a set of more than $>100$k alternatives within 100 or fewer expensive queries.
arXiv Detail & Related papers (2024-01-02T18:34:29Z) - BOtied: Multi-objective Bayesian optimization with tied multivariate ranks [33.414682601242006]
In this paper, we show a natural connection between non-dominated solutions and the extreme quantile of the joint cumulative distribution function.
Motivated by this link, we propose the Pareto-compliant CDF indicator and the associated acquisition function, BOtied.
Our experiments on a variety of synthetic and real-world problems demonstrate that BOtied outperforms state-of-the-art MOBO acquisition functions.
arXiv Detail & Related papers (2023-06-01T04:50:06Z) - Sample-efficient Multi-objective Molecular Optimization with GFlowNets [5.030493242666028]
We propose a multi-objective Bayesian optimization (MOBO) algorithm leveraging the hypernetwork-based GFlowNets (HN-GFN)
Using a single preference-conditioned hypernetwork, HN-GFN learns to explore various trade-offs between objectives.
Experiments in various real-world settings demonstrate that our framework predominantly outperforms existing methods in terms of candidate quality and sample efficiency.
arXiv Detail & Related papers (2023-02-08T13:30:28Z) - Multi-Agent Reinforcement Learning for Microprocessor Design Space
Exploration [71.95914457415624]
Microprocessor architects are increasingly resorting to domain-specific customization in the quest for high-performance and energy-efficiency.
We propose an alternative formulation that leverages Multi-Agent RL (MARL) to tackle this problem.
Our evaluation shows that the MARL formulation consistently outperforms single-agent RL baselines.
arXiv Detail & Related papers (2022-11-29T17:10:24Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.