Dimensionality Reduction and Prioritized Exploration for Policy Search
- URL: http://arxiv.org/abs/2203.04791v1
- Date: Wed, 9 Mar 2022 15:17:09 GMT
- Title: Dimensionality Reduction and Prioritized Exploration for Policy Search
- Authors: Marius Memmel, Puze Liu, Davide Tateo, Jan Peters
- Abstract summary: Black-box policy optimization is a class of reinforcement learning algorithms that explores and updates the policies at the parameter level.
We present a novel method to prioritize the exploration of effective parameters and cope with full covariance matrix updates.
Our algorithm learns faster than recent approaches and requires fewer samples to achieve state-of-the-art results.
- Score: 29.310742141970394
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Black-box policy optimization is a class of reinforcement learning algorithms
that explores and updates the policies at the parameter level. This class of
algorithms is widely applied in robotics with movement primitives or
non-differentiable policies. Furthermore, these approaches are particularly
relevant where exploration at the action level could cause actuator damage or
other safety issues. However, Black-box optimization does not scale well with
the increasing dimensionality of the policy, leading to high demand for
samples, which are expensive to obtain in real-world systems. In many practical
applications, policy parameters do not contribute equally to the return.
Identifying the most relevant parameters allows to narrow down the exploration
and speed up the learning. Furthermore, updating only the effective parameters
requires fewer samples, improving the scalability of the method. We present a
novel method to prioritize the exploration of effective parameters and cope
with full covariance matrix updates. Our algorithm learns faster than recent
approaches and requires fewer samples to achieve state-of-the-art results. To
select the effective parameters, we consider both the Pearson correlation
coefficient and the Mutual Information. We showcase the capabilities of our
approach on the Relative Entropy Policy Search algorithm in several simulated
environments, including robotics simulations. Code is available at
https://git.ias.informatik.tu-darmstadt.de/ias\_code/aistats2022/dr-creps}{git.ias.informatik.tu-darmstadt.de/ias\_code/aistats2022/dr-creps.
Related papers
- Augmented Bayesian Policy Search [14.292685001631945]
In practice, exploration is largely performed by deterministic policies.
First-order Bayesian Optimization (BO) methods offer a principled way of performing exploration using deterministic policies.
We introduce a novel mean function for the probabilistic model.
This results in augmenting BO methods with the action-value function.
arXiv Detail & Related papers (2024-07-05T20:56:45Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - GPU-Accelerated Policy Optimization via Batch Automatic Differentiation
of Gaussian Processes for Real-World Control [8.720903734757627]
We develop a policy optimization method by leveraging fast predictive sampling methods to process batches of trajectories in every forward pass.
We demonstrate the effectiveness of our approach in training policies on a set of reference-tracking control experiments with a heavy-duty machine.
arXiv Detail & Related papers (2022-02-28T09:31:15Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z) - Variance-Reduced Off-Policy Memory-Efficient Policy Search [61.23789485979057]
Off-policy policy optimization is a challenging problem in reinforcement learning.
Off-policy algorithms are memory-efficient and capable of learning from off-policy samples.
arXiv Detail & Related papers (2020-09-14T16:22:46Z) - Reinforcement Learning with Fast Stabilization in Linear Dynamical
Systems [91.43582419264763]
We study model-based reinforcement learning (RL) in unknown stabilizable linear dynamical systems.
We propose an algorithm that certifies fast stabilization of the underlying system by effectively exploring the environment.
We show that the proposed algorithm attains $tildemathcalO(sqrtT)$ regret after $T$ time steps of agent-environment interaction.
arXiv Detail & Related papers (2020-07-23T23:06:40Z) - Robust Policy Search for Robot Navigation with Stochastic Meta-Policies [5.7871177330714145]
In this work, we exploit the main ingredients of Bayesian optimization to provide robustness to different issues for policy search algorithms.
We combine several methods and show how their interaction works better than the sum of the parts.
We compare the proposed algorithm with previous results in several optimization benchmarks and robot tasks, such as pushing objects with a robot arm, or path finding with a rover.
arXiv Detail & Related papers (2020-03-02T16:30:59Z) - Kalman meets Bellman: Improving Policy Evaluation through Value Tracking [59.691919635037216]
Policy evaluation is a key process in Reinforcement Learning (RL)
We devise an optimization method, called Kalman Optimization for Value Approximation (KOVA)
KOVA minimizes a regularized objective function that concerns both parameter and noisy return uncertainties.
arXiv Detail & Related papers (2020-02-17T13:30:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.