Related papers: Local policy search with Bayesian optimization

Local policy search with Bayesian optimization

URL: http://arxiv.org/abs/2106.11899v1
Date: Tue, 22 Jun 2021 16:07:02 GMT
Title: Local policy search with Bayesian optimization
Authors: Sarah M\"uller, Alexander von Rohr, Sebastian Trimpe
Abstract summary: Reinforcement learning aims to find an optimal policy by interaction with an environment. Policy gradients for local search are often obtained from random perturbations. We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
Score: 73.0364959221845
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning (RL) aims to find an optimal policy by interaction with an environment. Consequently, learning complex behavior requires a vast number of samples, which can be prohibitive in practice. Nevertheless, instead of systematically reasoning and actively choosing informative samples, policy gradients for local search are often obtained from random perturbations. These random samples yield high variance estimates and hence are sub-optimal in terms of sample complexity. Actively selecting informative samples is at the core of Bayesian optimization, which constructs a probabilistic surrogate of the objective from past samples to reason about informative subsequent ones. In this paper, we propose to join both worlds. We develop an algorithm utilizing a probabilistic model of the objective function and its gradient. Based on the model, the algorithm decides where to query a noisy zeroth-order oracle to improve the gradient estimates. The resulting algorithm is a novel type of policy search method, which we compare to existing black-box algorithms. The comparison reveals improved sample complexity and reduced variance in extensive empirical evaluations on synthetic objectives. Further, we highlight the benefits of active sampling on popular RL benchmarks.

Related papers

Hybrid least squares for learning functions from highly noisy data [7.096701481970196]
We consider a least-squares function approximation problem with heavily polluted data.<n>Existing methods that are powerful in the small noise regime are suboptimal when large noise is present.<n>We show that the proposed algorithm enjoys appropriate optimality properties for both sample point generation and noise mollification.
arXiv Detail & Related papers (2025-07-03T00:31:29Z)
Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting [56.92178753201331]
We propose the Observation-Aware Spectral (OAS) estimation technique, which enables the POMDP parameters to be learned from samples collected using a belief-based policy. We show the consistency of the OAS procedure, and we prove a regret guarantee of order $mathcalO(sqrtT log(T)$ for the proposed OAS-UCRL algorithm.
arXiv Detail & Related papers (2024-10-02T08:46:34Z)
Random Exploration in Bayesian Optimization: Order-Optimal Regret and Computational Efficiency [18.17090625880964]
We study the methodology of exploring the domain using random samples drawn from a distribution. We show that this random exploration approach achieves the optimal error rates.
arXiv Detail & Related papers (2023-10-23T20:30:44Z)
Towards Automated Imbalanced Learning with Deep Hierarchical Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class. Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class. We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z)
Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise Comparisons [85.5955376526419]
In rank aggregation problems, users exhibit various accuracy levels when comparing pairs of items. We propose an elimination-based active sampling strategy, which estimates the ranking of items via noisy pairwise comparisons. We prove that our algorithm can return the true ranking of items with high probability.
arXiv Detail & Related papers (2021-10-08T13:51:55Z)
Adaptive Sampling Quasi-Newton Methods for Zeroth-Order Stochastic Optimization [1.7513645771137178]
We consider unconstrained optimization problems with no available gradient information. We propose an adaptive sampling quasi-Newton method where we estimate the gradients of a simulation function using finite differences within a common random number framework. We develop modified versions of a norm test and an inner product quasi-Newton test to control the sample sizes used in the approximations and provide global convergence results to the neighborhood of the optimal solution.
arXiv Detail & Related papers (2021-09-24T21:49:25Z)
Towards Feature-Based Performance Regression Using Trajectory Data [0.9281671380673306]
Black-box optimization is a very active area of research, with many new algorithms being developed every year. The variety of algorithms poses a meta-problem: which algorithm to choose for a given problem at hand? Past research has shown that per-instance algorithm selection based on exploratory landscape analysis can be an efficient mean to tackle this meta-problem.
arXiv Detail & Related papers (2021-02-10T10:19:13Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
Variance-Reduced Off-Policy Memory-Efficient Policy Search [61.23789485979057]
Off-policy policy optimization is a challenging problem in reinforcement learning. Off-policy algorithms are memory-efficient and capable of learning from off-policy samples.
arXiv Detail & Related papers (2020-09-14T16:22:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.