Learning to Do or Learning While Doing: Reinforcement Learning and
Bayesian Optimisation for Online Continuous Tuning
- URL: http://arxiv.org/abs/2306.03739v1
- Date: Tue, 6 Jun 2023 14:56:47 GMT
- Title: Learning to Do or Learning While Doing: Reinforcement Learning and
Bayesian Optimisation for Online Continuous Tuning
- Authors: Jan Kaiser, Chenran Xu, Annika Eichler, Andrea Santamaria Garcia,
Oliver Stein, Erik Br\"undermann, Willi Kuropka, Hannes Dinter, Frank Mayet,
Thomas Vinatier, Florian Burkart, Holger Schlarb
- Abstract summary: We present a comparative study using a routine task in a real particle accelerator as an example.
Based on the study's results, we provide a clear set of criteria to guide the choice of algorithm for a given tuning task.
These can ease the adoption of learning-based autonomous tuning solutions to the operation of complex real-world plants.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Online tuning of real-world plants is a complex optimisation problem that
continues to require manual intervention by experienced human operators.
Autonomous tuning is a rapidly expanding field of research, where
learning-based methods, such as Reinforcement Learning-trained Optimisation
(RLO) and Bayesian optimisation (BO), hold great promise for achieving
outstanding plant performance and reducing tuning times. Which algorithm to
choose in different scenarios, however, remains an open question. Here we
present a comparative study using a routine task in a real particle accelerator
as an example, showing that RLO generally outperforms BO, but is not always the
best choice. Based on the study's results, we provide a clear set of criteria
to guide the choice of algorithm for a given tuning task. These can ease the
adoption of learning-based autonomous tuning solutions to the operation of
complex real-world plants, ultimately improving the availability and pushing
the limits of operability of these facilities, thereby enabling scientific and
engineering advancements.
Related papers
- Beyond Training: Optimizing Reinforcement Learning Based Job Shop Scheduling Through Adaptive Action Sampling [10.931466852026663]
We investigate the optimal use of trained deep reinforcement learning (DRL) agents during inference.
Our work is based on the hypothesis that, similar to search algorithms, the utilization of trained DRL agents should be dependent on the acceptable computational budget.
We propose an algorithm for obtaining the optimal parameterization for such a given number of solutions and any given trained agent.
arXiv Detail & Related papers (2024-06-11T14:59:18Z) - Large Language Models for Human-Machine Collaborative Particle Accelerator Tuning through Natural Language [14.551969747057642]
We propose the use of large language models (LLMs) to tune particle accelerators.
We demonstrate the ability of LLMs to successfully and autonomously tune a particle accelerator subsystem based on nothing more than a natural language prompt from the operator.
In doing so, we also show how LLMs can perform numerical optimisation of a highly non-linear real-world objective function.
arXiv Detail & Related papers (2024-05-14T18:05:44Z) - SPO: Sequential Monte Carlo Policy Optimisation [41.52684912140086]
We introduce SPO: Sequential Monte Carlo Policy optimisation.
We show that SPO provides robust policy improvement and efficient scaling properties.
We demonstrate statistically significant improvements in performance relative to model-free and model-based baselines.
arXiv Detail & Related papers (2024-02-12T10:32:47Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Mechanic: A Learning Rate Tuner [52.4242550204696]
We introduce a technique for tuning the learning rate scale factor of any base optimization algorithm and schedule automatically, which we call textscmechanic.
We rigorously evaluate textscmechanic on a range of large scale deep learning tasks with varying batch sizes, schedules, and base optimization algorithms.
arXiv Detail & Related papers (2023-05-31T19:32:43Z) - Learning to Optimize for Reinforcement Learning [58.01132862590378]
Reinforcement learning (RL) is essentially different from supervised learning, and in practice, these learneds do not work well even in simple RL tasks.
Agent-gradient distribution is non-independent and identically distributed, leading to inefficient meta-training.
We show that, although only trained in toy tasks, our learned can generalize unseen complex tasks in Brax.
arXiv Detail & Related papers (2023-02-03T00:11:02Z) - Reverse engineering learned optimizers reveals known and novel
mechanisms [50.50540910474342]
Learneds are algorithms that can themselves be trained to solve optimization problems.
Our results help elucidate the previously murky understanding of how learneds work, and establish tools for interpreting future learneds.
arXiv Detail & Related papers (2020-11-04T07:12:43Z) - Learning with Differentiable Perturbed Optimizers [54.351317101356614]
We propose a systematic method to transform operations into operations that are differentiable and never locally constant.
Our approach relies on perturbeds, and can be used readily together with existing solvers.
We show how this framework can be connected to a family of losses developed in structured prediction, and give theoretical guarantees for their use in learning tasks.
arXiv Detail & Related papers (2020-02-20T11:11:32Z) - Optimizing Wireless Systems Using Unsupervised and
Reinforced-Unsupervised Deep Learning [96.01176486957226]
Resource allocation and transceivers in wireless networks are usually designed by solving optimization problems.
In this article, we introduce unsupervised and reinforced-unsupervised learning frameworks for solving both variable and functional optimization problems.
arXiv Detail & Related papers (2020-01-03T11:01:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.