Related papers: Regret-Aware Black-Box Optimization with Natural Gradients, Trust-Regions and Entropy Control

Regret-Aware Black-Box Optimization with Natural Gradients, Trust-Regions and Entropy Control

URL: http://arxiv.org/abs/2206.06090v1
Date: Tue, 24 May 2022 16:25:15 GMT
Title: Regret-Aware Black-Box Optimization with Natural Gradients, Trust-Regions and Entropy Control
Authors: Maximilian H\"uttenrauch, Gerhard Neumann
Abstract summary: Most successful black-boxs, such as CMA-ES, use rankings of the individual samples to obtain a new search distribution. While these algorithms typically produce a high-quality mean estimate of the search distribution, the produced samples can have poor quality as these algorithms are ignorant of the regret. In contrast, the Relative Entropy Search (MORE) algorithm directly optimize the expected fitness function without the use of rankings.
Score: 17.430247457941284
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Most successful stochastic black-box optimizers, such as CMA-ES, use rankings of the individual samples to obtain a new search distribution. Yet, the use of rankings also introduces several issues such as the underlying optimization objective is often unclear, i.e., we do not optimize the expected fitness. Further, while these algorithms typically produce a high-quality mean estimate of the search distribution, the produced samples can have poor quality as these algorithms are ignorant of the regret. Lastly, noisy fitness function evaluations may result in solutions that are highly sub-optimal on expectation. In contrast, stochastic optimizers that are motivated by policy gradients, such as the Model-based Relative Entropy Stochastic Search (MORE) algorithm, directly optimize the expected fitness function without the use of rankings. MORE can be derived by applying natural policy gradients and compatible function approximation, and is using information theoretic constraints to ensure the stability of the policy update. While MORE does not suffer from the listed limitations, it often cannot achieve state of the art performance in comparison to ranking based methods. We improve MORE by decoupling the update of the mean and covariance of the search distribution allowing for more aggressive updates on the mean while keeping the update on the covariance conservative, an improved entropy scheduling technique based on an evolution path which results in faster convergence and a simplified and more effective model learning approach in comparison to the original paper. We compare our algorithm to state of the art black-box optimization algorithms on standard optimization tasks as well as on episodic RL tasks in robotics where it is also crucial to have small regret. We obtain competitive results on benchmark functions and clearly outperform ranking-based methods in terms of regret on the RL tasks.

Related papers

Fast Two-Time-Scale Stochastic Gradient Method with Applications in Reinforcement Learning [5.325297567945828]
We propose a new method for two-time-scale optimization that achieves significantly faster convergence than the prior arts. We characterize the proposed algorithm under various conditions and show how it specializes on online sample-based methods.
arXiv Detail & Related papers (2024-05-15T19:03:08Z)
Beyond Single-Model Views for Deep Learning: Optimization versus Generalizability of Stochastic Optimization Algorithms [13.134564730161983]
This paper adopts a novel approach to deep learning optimization, focusing on gradient descent (SGD) and its variants. We show that SGD and its variants demonstrate performance on par with flat-minimas like SAM, albeit with half the gradient evaluations. Our study uncovers several key findings regarding the relationship between training loss and hold-out accuracy, as well as the comparable performance of SGD and noise-enabled variants.
arXiv Detail & Related papers (2024-03-01T14:55:22Z)
On Adaptivity in Non-stationary Stochastic Optimization With Bandit Feedback [11.208914594208654]
We show that, when aggregated function changes is known a priori, a simple re-starting algorithm attains the optimal dynamic regret. We also establish an additional result showing that any algorithm achieving good regret against stationary benchmarks with high probability could be automatically converted to an algorithm that achieves good regret against dynamic benchmarks.
arXiv Detail & Related papers (2022-10-11T16:16:34Z)
Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy. Many algorithms for IRL have an inherently nested structure. We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z)
Generalizing Bayesian Optimization with Decision-theoretic Entropies [102.82152945324381]
We consider a generalization of Shannon entropy from work in statistical decision theory. We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures. We then show how alternative choices for the loss yield a flexible family of acquisition functions.
arXiv Detail & Related papers (2022-10-04T04:43:58Z)
Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment. Policy gradients for local search are often obtained from random perturbations. We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z)
Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling. Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z)
Meta Learning Black-Box Population-Based Optimizers [0.0]
We propose the use of meta-learning to infer population-based blackbox generalizations. We show that the meta-loss function encourages a learned algorithm to alter its search behavior so that it can easily fit into a new context.
arXiv Detail & Related papers (2021-03-05T08:13:25Z)
Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs. The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z)
Better Parameter-free Stochastic Optimization with ODE Updates for Coin-Betting [31.60239268539764]
PFSGD algorithms do not require setting learning rates while achieving optimal theoretical performance. In this paper, we close the empirical gap with a new parameter-free algorithm based on continuous-time Coin-Betting on truncated models. We show empirically that this new parameter-free algorithm outperforms algorithms with the "best default" learning rates and almost matches the performance of finely tuned baselines without anything to tune.
arXiv Detail & Related papers (2020-06-12T23:10:25Z)
Optimizing for the Future in Non-Stationary MDPs [52.373873622008944]
We present a policy gradient algorithm that maximizes a forecast of future performance. We show that our algorithm, called Prognosticator, is more robust to non-stationarity than two online adaptation techniques.
arXiv Detail & Related papers (2020-05-17T03:41:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.