Robust Policy Search for Robot Navigation with Stochastic Meta-Policies
- URL: http://arxiv.org/abs/2003.01000v1
- Date: Mon, 2 Mar 2020 16:30:59 GMT
- Title: Robust Policy Search for Robot Navigation with Stochastic Meta-Policies
- Authors: Javier Garcia-Barcos, Ruben Martinez-Cantin
- Abstract summary: In this work, we exploit the main ingredients of Bayesian optimization to provide robustness to different issues for policy search algorithms.
We combine several methods and show how their interaction works better than the sum of the parts.
We compare the proposed algorithm with previous results in several optimization benchmarks and robot tasks, such as pushing objects with a robot arm, or path finding with a rover.
- Score: 5.7871177330714145
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bayesian optimization is an efficient nonlinear optimization method where the
queries are carefully selected to gather information about the optimum
location. Thus, in the context of policy search, it has been called active
policy search. The main ingredients of Bayesian optimization for sample
efficiency are the probabilistic surrogate model and the optimal decision
heuristics. In this work, we exploit those to provide robustness to different
issues for policy search algorithms. We combine several methods and show how
their interaction works better than the sum of the parts. First, to deal with
input noise and provide a safe and repeatable policy we use an improved version
of unscented Bayesian optimization. Then, to deal with mismodeling errors and
improve exploration we use stochastic meta-policies for query selection and an
adaptive kernel. We compare the proposed algorithm with previous results in
several optimization benchmarks and robot tasks, such as pushing objects with a
robot arm, or path finding with a rover.
Related papers
- Towards Efficient Exact Optimization of Language Model Alignment [93.39181634597877]
Direct preference optimization (DPO) was proposed to directly optimize the policy from preference data.
We show that DPO derived based on the optimal solution of problem leads to a compromised mean-seeking approximation of the optimal solution in practice.
We propose efficient exact optimization (EXO) of the alignment objective.
arXiv Detail & Related papers (2024-02-01T18:51:54Z) - Acceleration in Policy Optimization [50.323182853069184]
We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates.
We define optimism as predictive modelling of the future behavior of a policy, and adaptivity as taking immediate and anticipatory corrective actions to mitigate errors from overshooting predictions or delayed responses to change.
We design an optimistic policy gradient algorithm, adaptive via meta-gradient learning, and empirically highlight several design choices pertaining to acceleration, in an illustrative task.
arXiv Detail & Related papers (2023-06-18T15:50:57Z) - Wasserstein Gradient Flows for Optimizing Gaussian Mixture Policies [0.0]
Policy optimization is the emphde facto paradigm to adapt robot policies as a function of task-specific objectives.
We propose to leverage the structure of probabilistic policies by casting the policy optimization as an optimal transport problem.
We evaluate our approach on common robotic settings: reaching motions, collision-avoidance behaviors, and multi-goal tasks.
arXiv Detail & Related papers (2023-05-17T17:48:24Z) - Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest.
Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree.
We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z) - Tensor Train for Global Optimization Problems in Robotics [6.702251803443858]
The convergence of many numerical optimization techniques is highly dependent on the initial guess given to the solver.
We propose a novel approach that utilizes methods to initialize existing optimization solvers near global optima.
We show that the proposed method can generate samples close to global optima and from multiple modes.
arXiv Detail & Related papers (2022-06-10T13:18:26Z) - Dimensionality Reduction and Prioritized Exploration for Policy Search [29.310742141970394]
Black-box policy optimization is a class of reinforcement learning algorithms that explores and updates the policies at the parameter level.
We present a novel method to prioritize the exploration of effective parameters and cope with full covariance matrix updates.
Our algorithm learns faster than recent approaches and requires fewer samples to achieve state-of-the-art results.
arXiv Detail & Related papers (2022-03-09T15:17:09Z) - Bayesian Optimization for auto-tuning GPU kernels [0.0]
Finding optimal parameter configurations for GPU kernels is a non-trivial exercise for large search spaces, even when automated.
We introduce a novel contextual exploration factor as well as new acquisition functions with improved scalability, combined with an informed function selection mechanism.
arXiv Detail & Related papers (2021-11-26T11:26:26Z) - Understanding the Effect of Stochasticity in Policy Optimization [86.7574122154668]
We show that the preferability of optimization methods depends critically on whether exact gradients are used.
Second, to explain these findings we introduce the concept of committal rate for policy optimization.
Third, we show that in the absence of external oracle information, there is an inherent trade-off between exploiting geometry to accelerate convergence versus achieving optimality almost surely.
arXiv Detail & Related papers (2021-10-29T06:35:44Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Provably Efficient Exploration in Policy Optimization [117.09887790160406]
This paper proposes an Optimistic variant of the Proximal Policy Optimization algorithm (OPPO)
OPPO achieves $tildeO(sqrtd2 H3 T )$ regret.
To the best of our knowledge, OPPO is the first provably efficient policy optimization algorithm that explores.
arXiv Detail & Related papers (2019-12-12T08:40:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.