Related papers: Dimensionality Reduction and Prioritized Exploration for Policy Search

Dimensionality Reduction and Prioritized Exploration for Policy Search

URL: http://arxiv.org/abs/2203.04791v1
Date: Wed, 9 Mar 2022 15:17:09 GMT
Title: Dimensionality Reduction and Prioritized Exploration for Policy Search
Authors: Marius Memmel, Puze Liu, Davide Tateo, Jan Peters
Abstract summary: Black-box policy optimization is a class of reinforcement learning algorithms that explores and updates the policies at the parameter level. We present a novel method to prioritize the exploration of effective parameters and cope with full covariance matrix updates. Our algorithm learns faster than recent approaches and requires fewer samples to achieve state-of-the-art results.
Score: 29.310742141970394
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Black-box policy optimization is a class of reinforcement learning algorithms that explores and updates the policies at the parameter level. This class of algorithms is widely applied in robotics with movement primitives or non-differentiable policies. Furthermore, these approaches are particularly relevant where exploration at the action level could cause actuator damage or other safety issues. However, Black-box optimization does not scale well with the increasing dimensionality of the policy, leading to high demand for samples, which are expensive to obtain in real-world systems. In many practical applications, policy parameters do not contribute equally to the return. Identifying the most relevant parameters allows to narrow down the exploration and speed up the learning. Furthermore, updating only the effective parameters requires fewer samples, improving the scalability of the method. We present a novel method to prioritize the exploration of effective parameters and cope with full covariance matrix updates. Our algorithm learns faster than recent approaches and requires fewer samples to achieve state-of-the-art results. To select the effective parameters, we consider both the Pearson correlation coefficient and the Mutual Information. We showcase the capabilities of our approach on the Relative Entropy Policy Search algorithm in several simulated environments, including robotics simulations. Code is available at https://git.ias.informatik.tu-darmstadt.de/ias\_code/aistats2022/dr-creps}{git.ias.informatik.tu-darmstadt.de/ias\_code/aistats2022/dr-creps.

Related papers

Augmented Bayesian Policy Search [14.292685001631945]
In practice, exploration is largely performed by deterministic policies. First-order Bayesian Optimization (BO) methods offer a principled way of performing exploration using deterministic policies. We introduce a novel mean function for the probabilistic model. This results in augmenting BO methods with the action-value function.
arXiv Detail & Related papers (2024-07-05T20:56:45Z)
Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems. In common practice, convergence (hyper)policies are learned only to deploy their deterministic version. We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z)
GPU-Accelerated Policy Optimization via Batch Automatic Differentiation of Gaussian Processes for Real-World Control [8.720903734757627]
We develop a policy optimization method by leveraging fast predictive sampling methods to process batches of trajectories in every forward pass. We demonstrate the effectiveness of our approach in training policies on a set of reference-tracking control experiments with a heavy-duty machine.
arXiv Detail & Related papers (2022-02-28T09:31:15Z)
Zeroth-Order Actor-Critic: An Evolutionary Framework for Sequential Decision Problems [17.713459311502636]
We propose a novel evolutionary framework Zeroth-Order Actor-Critic (ZOAC) to solve sequential decision problems (SDPs) ZOAC uses step-wise exploration in parameter space and theoretically derive the zeroth-order policy gradient. It significantly outperforms EAs that treat the problem as static optimization and matches the performance of gradient-based RL methods even without first-order information.
arXiv Detail & Related papers (2022-01-29T07:09:03Z)
Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment. Policy gradients for local search are often obtained from random perturbations. We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z)
Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling. Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z)
Variance-Reduced Off-Policy Memory-Efficient Policy Search [61.23789485979057]
Off-policy policy optimization is a challenging problem in reinforcement learning. Off-policy algorithms are memory-efficient and capable of learning from off-policy samples.
arXiv Detail & Related papers (2020-09-14T16:22:46Z)
Reinforcement Learning with Fast Stabilization in Linear Dynamical Systems [91.43582419264763]
We study model-based reinforcement learning (RL) in unknown stabilizable linear dynamical systems. We propose an algorithm that certifies fast stabilization of the underlying system by effectively exploring the environment. We show that the proposed algorithm attains $tildemathcalO(sqrtT)$ regret after $T$ time steps of agent-environment interaction.
arXiv Detail & Related papers (2020-07-23T23:06:40Z)
Robust Policy Search for Robot Navigation with Stochastic Meta-Policies [5.7871177330714145]
In this work, we exploit the main ingredients of Bayesian optimization to provide robustness to different issues for policy search algorithms. We combine several methods and show how their interaction works better than the sum of the parts. We compare the proposed algorithm with previous results in several optimization benchmarks and robot tasks, such as pushing objects with a robot arm, or path finding with a rover.
arXiv Detail & Related papers (2020-03-02T16:30:59Z)
Kalman meets Bellman: Improving Policy Evaluation through Value Tracking [59.691919635037216]
Policy evaluation is a key process in Reinforcement Learning (RL) We devise an optimization method, called Kalman Optimization for Value Approximation (KOVA) KOVA minimizes a regularized objective function that concerns both parameter and noisy return uncertainties.
arXiv Detail & Related papers (2020-02-17T13:30:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.