Local policy search with Bayesian optimization
- URL: http://arxiv.org/abs/2106.11899v1
- Date: Tue, 22 Jun 2021 16:07:02 GMT
- Title: Local policy search with Bayesian optimization
- Authors: Sarah M\"uller, Alexander von Rohr, Sebastian Trimpe
- Abstract summary: Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
- Score: 73.0364959221845
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) aims to find an optimal policy by interaction
with an environment. Consequently, learning complex behavior requires a vast
number of samples, which can be prohibitive in practice. Nevertheless, instead
of systematically reasoning and actively choosing informative samples, policy
gradients for local search are often obtained from random perturbations. These
random samples yield high variance estimates and hence are sub-optimal in terms
of sample complexity. Actively selecting informative samples is at the core of
Bayesian optimization, which constructs a probabilistic surrogate of the
objective from past samples to reason about informative subsequent ones. In
this paper, we propose to join both worlds. We develop an algorithm utilizing a
probabilistic model of the objective function and its gradient. Based on the
model, the algorithm decides where to query a noisy zeroth-order oracle to
improve the gradient estimates. The resulting algorithm is a novel type of
policy search method, which we compare to existing black-box algorithms. The
comparison reveals improved sample complexity and reduced variance in extensive
empirical evaluations on synthetic objectives. Further, we highlight the
benefits of active sampling on popular RL benchmarks.
Related papers
- Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting [56.92178753201331]
We propose the Observation-Aware Spectral (OAS) estimation technique, which enables the POMDP parameters to be learned from samples collected using a belief-based policy.
We show the consistency of the OAS procedure, and we prove a regret guarantee of order $mathcalO(sqrtT log(T)$ for the proposed OAS-UCRL algorithm.
arXiv Detail & Related papers (2024-10-02T08:46:34Z) - Random Exploration in Bayesian Optimization: Order-Optimal Regret and
Computational Efficiency [18.17090625880964]
We study the methodology of exploring the domain using random samples drawn from a distribution.
We show that this random exploration approach achieves the optimal error rates.
arXiv Detail & Related papers (2023-10-23T20:30:44Z) - Towards Automated Imbalanced Learning with Deep Hierarchical
Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class.
Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class.
We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z) - Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise
Comparisons [85.5955376526419]
In rank aggregation problems, users exhibit various accuracy levels when comparing pairs of items.
We propose an elimination-based active sampling strategy, which estimates the ranking of items via noisy pairwise comparisons.
We prove that our algorithm can return the true ranking of items with high probability.
arXiv Detail & Related papers (2021-10-08T13:51:55Z) - Adaptive Sampling Quasi-Newton Methods for Zeroth-Order Stochastic
Optimization [1.7513645771137178]
We consider unconstrained optimization problems with no available gradient information.
We propose an adaptive sampling quasi-Newton method where we estimate the gradients of a simulation function using finite differences within a common random number framework.
We develop modified versions of a norm test and an inner product quasi-Newton test to control the sample sizes used in the approximations and provide global convergence results to the neighborhood of the optimal solution.
arXiv Detail & Related papers (2021-09-24T21:49:25Z) - Towards Feature-Based Performance Regression Using Trajectory Data [0.9281671380673306]
Black-box optimization is a very active area of research, with many new algorithms being developed every year.
The variety of algorithms poses a meta-problem: which algorithm to choose for a given problem at hand?
Past research has shown that per-instance algorithm selection based on exploratory landscape analysis can be an efficient mean to tackle this meta-problem.
arXiv Detail & Related papers (2021-02-10T10:19:13Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Variance-Reduced Off-Policy Memory-Efficient Policy Search [61.23789485979057]
Off-policy policy optimization is a challenging problem in reinforcement learning.
Off-policy algorithms are memory-efficient and capable of learning from off-policy samples.
arXiv Detail & Related papers (2020-09-14T16:22:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.