Bayesian Optimization for Molecules Should Be Pareto-Aware
- URL: http://arxiv.org/abs/2507.13704v2
- Date: Mon, 21 Jul 2025 10:52:25 GMT
- Title: Bayesian Optimization for Molecules Should Be Pareto-Aware
- Authors: Anabel Yong, Austin Tripp, Layla Hosseini-Gerami, Brooks Paige,
- Abstract summary: Multi-objective Bayesian optimization (MOBO) provides a principled framework for navigating trade-offs in molecular design.<n>We benchmark a simple MOBO strategy against a simple fixed-weight scalarized baseline using Expected Improvement (EI)<n>Our results show that even strong deterministic instantiations can underperform in low-data regimes.
- Score: 6.877358955271652
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-objective Bayesian optimization (MOBO) provides a principled framework for navigating trade-offs in molecular design. However, its empirical advantages over scalarized alternatives remain underexplored. We benchmark a simple Pareto-based MOBO strategy -- Expected Hypervolume Improvement (EHVI) -- against a simple fixed-weight scalarized baseline using Expected Improvement (EI), under a tightly controlled setup with identical Gaussian Process surrogates and molecular representations. Across three molecular optimization tasks, EHVI consistently outperforms scalarized EI in terms of Pareto front coverage, convergence speed, and chemical diversity. While scalarization encompasses flexible variants -- including random or adaptive schemes -- our results show that even strong deterministic instantiations can underperform in low-data regimes. These findings offer concrete evidence for the practical advantages of Pareto-aware acquisition in de novo molecular optimization, especially when evaluation budgets are limited and trade-offs are nontrivial.
Related papers
- Divergence Minimization Preference Optimization for Diffusion Model Alignment [58.651951388346525]
Divergence Minimization Preference Optimization (DMPO) is a principled method for aligning diffusion models by minimizing reverse KL divergence.<n>Our results show that diffusion models fine-tuned with DMPO can consistently outperform or match existing techniques.<n>DMPO unlocks a robust and elegant pathway for preference alignment, bridging principled theory with practical performance in diffusion models.
arXiv Detail & Related papers (2025-07-10T07:57:30Z) - Return of the Latent Space COWBOYS: Re-thinking the use of VAEs for Bayesian Optimisation of Structured Spaces [13.38402522324075]
We propose a decoupled approach that trains a generative model and a Gaussian Process (GP) surrogate separately, then combines them via a simple yet principled Bayesian update rule.<n>We show that our decoupled approach improves our ability to identify high-potential candidates in molecular optimisation problems under constrained evaluation budgets.
arXiv Detail & Related papers (2025-07-05T05:53:04Z) - Stochastic Optimization with Optimal Importance Sampling [49.484190237840714]
We propose an iterative-based algorithm that jointly updates the decision and the IS distribution without requiring time-scale separation between the two.<n>Our method achieves the lowest possible variable variance and guarantees global convergence under convexity of the objective and mild assumptions on the IS distribution family.
arXiv Detail & Related papers (2025-04-04T16:10:18Z) - Sample-efficient Bayesian Optimisation Using Known Invariances [56.34916328814857]
We show that vanilla and constrained BO algorithms are inefficient when optimising invariant objectives.
We derive a bound on the maximum information gain of these invariant kernels.
We use our method to design a current drive system for a nuclear fusion reactor, finding a high-performance solution.
arXiv Detail & Related papers (2024-10-22T12:51:46Z) - Expensive Multi-Objective Bayesian Optimization Based on Diffusion Models [17.19004913553654]
Multi-objective Bayesian optimization (MOBO) has shown promising performance on various expensive multi-objective optimization problems (EMOPs)
We propose a novel Composite Diffusion Model based Pareto Set Learning algorithm, namely CDM-PSL, for expensive MOBO.
Our proposed algorithm attains superior performance compared with various state-of-the-art MOBO algorithms.
arXiv Detail & Related papers (2024-05-14T14:55:57Z) - Ensemble Kalman Filtering Meets Gaussian Process SSM for Non-Mean-Field and Online Inference [47.460898983429374]
We introduce an ensemble Kalman filter (EnKF) into the non-mean-field (NMF) variational inference framework to approximate the posterior distribution of the latent states.
This novel marriage between EnKF and GPSSM not only eliminates the need for extensive parameterization in learning variational distributions, but also enables an interpretable, closed-form approximation of the evidence lower bound (ELBO)
We demonstrate that the resulting EnKF-aided online algorithm embodies a principled objective function by ensuring data-fitting accuracy while incorporating model regularizations to mitigate overfitting.
arXiv Detail & Related papers (2023-12-10T15:22:30Z) - Pseudo-Bayesian Optimization [7.556071491014536]
We study an axiomatic framework that elicits the minimal requirements to guarantee black-box optimization convergence.<n>We show how using simple local regression, and a suitable "randomized prior" construction to quantify uncertainty, not only guarantees convergence but also consistently outperforms state-of-the-art benchmarks.
arXiv Detail & Related papers (2023-10-15T07:55:28Z) - Optimizing Hyperparameters with Conformal Quantile Regression [7.316604052864345]
We propose to leverage conformalized quantile regression which makes minimal assumptions about the observation noise.
This translates to quicker HPO convergence on empirical benchmarks.
arXiv Detail & Related papers (2023-05-05T15:33:39Z) - A Pareto-optimal compositional energy-based model for sampling and
optimization of protein sequences [55.25331349436895]
Deep generative models have emerged as a popular machine learning-based approach for inverse problems in the life sciences.
These problems often require sampling new designs that satisfy multiple properties of interest in addition to learning the data distribution.
arXiv Detail & Related papers (2022-10-19T19:04:45Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Batched Data-Driven Evolutionary Multi-Objective Optimization Based on
Manifold Interpolation [6.560512252982714]
We propose a framework for implementing batched data-driven evolutionary multi-objective optimization.
It is so general that any off-the-shelf evolutionary multi-objective optimization algorithms can be applied in a plug-in manner.
Our proposed framework is featured with a faster convergence and a stronger resilience to various PF shapes.
arXiv Detail & Related papers (2021-09-12T23:54:26Z) - Optimization-Inspired Learning with Architecture Augmentations and
Control Mechanisms for Low-Level Vision [74.9260745577362]
This paper proposes a unified optimization-inspired learning framework to aggregate Generative, Discriminative, and Corrective (GDC) principles.
We construct three propagative modules to effectively solve the optimization models with flexible combinations.
Experiments across varied low-level vision tasks validate the efficacy and adaptability of GDC.
arXiv Detail & Related papers (2020-12-10T03:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.