Related papers: EARL-BO: Reinforcement Learning for Multi-Step Lookahead, High-Dimensional Bayesian Optimization

EARL-BO: Reinforcement Learning for Multi-Step Lookahead, High-Dimensional Bayesian Optimization

URL: http://arxiv.org/abs/2411.00171v1
Date: Thu, 31 Oct 2024 19:33:21 GMT
Title: EARL-BO: Reinforcement Learning for Multi-Step Lookahead, High-Dimensional Bayesian Optimization
Authors: Mujin Cheon, Jay H. Lee, Dong-Yeun Koh, Calvin Tsay,
Abstract summary: This paper presents a novel reinforcement learning (RL)-based framework for multi-step lookahead BO in high-dimensional black-box optimization problems. We first introduce an Attention-DeepSets encoder to represent the state of knowledge to the RL agent and employ off-policy learning to accelerate its initial training. We then evaluate a multi-task, fine-tuning procedure based on end-to-end (encoderRL) on-policy learning.
Score: 1.8655559150764562
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Conventional methods for Bayesian optimization (BO) primarily involve one-step optimal decisions (e.g., maximizing expected improvement of the next step). To avoid myopic behavior, multi-step lookahead BO algorithms such as rollout strategies consider the sequential decision-making nature of BO, i.e., as a stochastic dynamic programming (SDP) problem, demonstrating promising results in recent years. However, owing to the curse of dimensionality, most of these methods make significant approximations or suffer scalability issues, e.g., being limited to two-step lookahead. This paper presents a novel reinforcement learning (RL)-based framework for multi-step lookahead BO in high-dimensional black-box optimization problems. The proposed method enhances the scalability and decision-making quality of multi-step lookahead BO by efficiently solving the SDP of the BO process in a near-optimal manner using RL. We first introduce an Attention-DeepSets encoder to represent the state of knowledge to the RL agent and employ off-policy learning to accelerate its initial training. We then propose a multi-task, fine-tuning procedure based on end-to-end (encoder-RL) on-policy learning. We evaluate the proposed method, EARL-BO (Encoder Augmented RL for Bayesian Optimization), on both synthetic benchmark functions and real-world hyperparameter optimization problems, demonstrating significantly improved performance compared to existing multi-step lookahead and high-dimensional BO methods.

Related papers

Preference Optimization for Combinatorial Optimization Problems [54.87466279363487]
Reinforcement Learning (RL) has emerged as a powerful tool for neural optimization, enabling models learns that solve complex problems without requiring expert knowledge.<n>Despite significant progress, existing RL approaches face challenges such as diminishing reward signals and inefficient exploration in vast action spaces.<n>We propose Preference Optimization, a novel method that transforms quantitative reward signals into qualitative preference signals via statistical comparison modeling.
arXiv Detail & Related papers (2025-05-13T16:47:00Z)
High-Dimensional Bayesian Optimization Using Both Random and Supervised Embeddings [0.6291443816903801]
This paper proposes a high-dimensionnal optimization method incorporating linear embedding subspaces of small dimension. The resulting BO method combines in an adaptive way both random and supervised linear embeddings. The obtained results show the high potential of EGORSE to solve high-dimensional blackbox optimization problems.
arXiv Detail & Related papers (2025-02-02T16:57:05Z)
Accelerated Preference Optimization for Large Language Model Alignment [60.22606527763201]
Reinforcement Learning from Human Feedback (RLHF) has emerged as a pivotal tool for aligning large language models (LLMs) with human preferences. Direct Preference Optimization (DPO) formulates RLHF as a policy optimization problem without explicitly estimating the reward function. We propose a general Accelerated Preference Optimization (APO) framework, which unifies many existing preference optimization algorithms.
arXiv Detail & Related papers (2024-10-08T18:51:01Z)
Reinforced In-Context Black-Box Optimization [64.25546325063272]
RIBBO is a method to reinforce-learn a BBO algorithm from offline data in an end-to-end fashion. RIBBO employs expressive sequence models to learn the optimization histories produced by multiple behavior algorithms and tasks. Central to our method is to augment the optimization histories with textitregret-to-go tokens, which are designed to represent the performance of an algorithm based on cumulative regret over the future part of the histories.
arXiv Detail & Related papers (2024-02-27T11:32:14Z)
End-to-End Learning for Fair Multiobjective Optimization Under Uncertainty [55.04219793298687]
The Predict-Then-Forecast (PtO) paradigm in machine learning aims to maximize downstream decision quality. This paper extends the PtO methodology to optimization problems with nondifferentiable Ordered Weighted Averaging (OWA) objectives. It shows how optimization of OWA functions can be effectively integrated with parametric prediction for fair and robust optimization under uncertainty.
arXiv Detail & Related papers (2024-02-12T16:33:35Z)
Poisson Process for Bayesian Optimization [126.51200593377739]
We propose a ranking-based surrogate model based on the Poisson process and introduce an efficient BO framework, namely Poisson Process Bayesian Optimization (PoPBO) Compared to the classic GP-BO method, our PoPBO has lower costs and better robustness to noise, which is verified by abundant experiments.
arXiv Detail & Related papers (2024-02-05T02:54:50Z)
Neuromorphic Bayesian Optimization in Lava [0.0]
We introduce Lava Bayesian Optimization (LavaBO) as a contribution to the open-source Lava Software Framework. LavaBO is the first step towards developing a BO system compatible with heterogeneous, fine-grained parallel, in-memory neuromorphic computing architectures. We evaluate the algorithmic performance of the LavaBO system on multiple problems such as training state-of-the-art spiking neural network through back-propagation and evolutionary learning.
arXiv Detail & Related papers (2023-05-18T15:54:23Z)
BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach [46.457298683984924]
Bilevel optimization (BO) is useful for solving a variety important machine learning problems. Conventional methods need to differentiate through the low-level optimization process with implicit differentiation. First-order BO depends only on first-order information, requires no implicit differentiation.
arXiv Detail & Related papers (2022-09-19T01:51:12Z)
Multi-Agent Deep Reinforcement Learning in Vehicular OCC [14.685237010856953]
We introduce a spectral efficiency optimization approach in vehicular OCC. We model the optimization problem as a Markov decision process (MDP) to enable the use of solutions that can be applied online. We verify the performance of our proposed scheme through extensive simulations and compare it with various variants of our approach and a random method.
arXiv Detail & Related papers (2022-05-05T14:25:54Z)
Teaching Networks to Solve Optimization Problems [13.803078209630444]
We propose to replace the iterative solvers altogether with a trainable parametric set function. We show the feasibility of learning such parametric (set) functions to solve various classic optimization problems.
arXiv Detail & Related papers (2022-02-08T19:13:13Z)
ES-Based Jacobian Enables Faster Bilevel Optimization [53.675623215542515]
Bilevel optimization (BO) has arisen as a powerful tool for solving many modern machine learning problems. Existing gradient-based methods require second-order derivative approximations via Jacobian- or/and Hessian-vector computations. We propose a novel BO algorithm, which adopts Evolution Strategies (ES) based method to approximate the response Jacobian matrix in the hypergradient of BO.
arXiv Detail & Related papers (2021-10-13T19:36:50Z)
Combining Deep Learning and Optimization for Security-Constrained Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems. Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs. This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)
Efficient Nonmyopic Bayesian Optimization via One-Shot Multi-Step Trees [28.46586066038317]
We provide the first efficient implementation of general multi-stepahead look Bayesian optimization. Instead of solving these problems in a nested way, we equivalently optimize all decision variables in the full tree jointly. We demonstrate that multistep expected improvement is tractable and exhibits performance superior to existing methods on a wide range of benchmarks.
arXiv Detail & Related papers (2020-06-29T02:17:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.