Related papers: Policy Manifold Search: Exploring the Manifold Hypothesis for Diversity-based Neuroevolution

Policy Manifold Search: Exploring the Manifold Hypothesis for Diversity-based Neuroevolution

URL: http://arxiv.org/abs/2104.13424v1
Date: Tue, 27 Apr 2021 18:52:03 GMT
Title: Policy Manifold Search: Exploring the Manifold Hypothesis for Diversity-based Neuroevolution
Authors: Nemanja Rakicevic, Antoine Cully, Petar Kormushev
Abstract summary: This paper proposes a novel method for diversity-based policy search via Neuroevolution. We use the Quality-Diversity framework which provides a principled approach to policy search. We also use the Jacobian of the inverse-mapping function to guide the search in the representation space.
Score: 4.920145245773581
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neuroevolution is an alternative to gradient-based optimisation that has the potential to avoid local minima and allows parallelisation. The main limiting factor is that usually it does not scale well with parameter space dimensionality. Inspired by recent work examining neural network intrinsic dimension and loss landscapes, we hypothesise that there exists a low-dimensional manifold, embedded in the policy network parameter space, around which a high-density of diverse and useful policies are located. This paper proposes a novel method for diversity-based policy search via Neuroevolution, that leverages learned representations of the policy network parameters, by performing policy search in this learned representation space. Our method relies on the Quality-Diversity (QD) framework which provides a principled approach to policy search, and maintains a collection of diverse policies, used as a dataset for learning policy representations. Further, we use the Jacobian of the inverse-mapping function to guide the search in the representation space. This ensures that the generated samples remain in the high-density regions, after mapping back to the original space. Finally, we evaluate our contributions on four continuous-control tasks in simulated environments, and compare to diversity-based baselines.

Related papers

Multilinear Tensor Low-Rank Approximation for Policy-Gradient Methods in Reinforcement Learning [27.868175900131313]
Reinforcement learning (RL) aims to estimate the action to take given a (time-varying) state. This paper postulates multi-linear mappings to efficiently estimate the parameters of the RL policy. We leverage the PARAFAC decomposition to design tensor low-rank policies.
arXiv Detail & Related papers (2025-01-08T23:22:08Z)
Statistical Analysis of Policy Space Compression Problem [54.1754937830779]
Policy search methods are crucial in reinforcement learning, offering a framework to address continuous state-action and partially observable problems. Reducing the policy space through policy compression emerges as a powerful, reward-free approach to accelerate the learning process. This technique condenses the policy space into a smaller, representative set while maintaining most of the original effectiveness.
arXiv Detail & Related papers (2024-11-15T02:46:55Z)
Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems. In common practice, convergence (hyper)policies are learned only to deploy their deterministic version. We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z)
Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces. We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories. We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z)
Policy Dispersion in Non-Markovian Environment [53.05904889617441]
This paper tries to learn the diverse policies from the history of state-action pairs under a non-Markovian environment. We first adopt a transformer-based method to learn policy embeddings. Then, we stack the policy embeddings to construct a dispersion matrix to induce a set of diverse policies.
arXiv Detail & Related papers (2023-02-28T11:58:39Z)
Fast Model-based Policy Search for Universal Policy Networks [45.44896435487879]
Adapting an agent's behaviour to new environments has been one of the primary focus areas of physics based reinforcement learning. We propose a Gaussian Process-based prior learned in simulation, that captures the likely performance of a policy when transferred to a previously unseen environment. We integrate this prior with a Bayesian optimisation-based policy search process to improve the efficiency of identifying the most appropriate policy from the universal policy network.
arXiv Detail & Related papers (2022-02-11T18:08:02Z)
Selection-Expansion: A Unifying Framework for Motion-Planning and Diversity Search Algorithms [69.87173070473717]
We investigate the properties of two diversity search algorithms, the Novelty Search and the Goal Exploration Process algorithms. The relation to MP algorithms reveals that the smoothness, or lack of smoothness of the mapping between the policy parameter space and the outcome space plays a key role in the search efficiency.
arXiv Detail & Related papers (2021-04-10T13:52:27Z)
Policy Manifold Search for Improving Diversity-based Neuroevolution [4.920145245773581]
We propose a novel approach to diversity-based policy search via Neuroevolution. Our approach iteratively collects policies according to the Quality-Diversity framework. We use the Jacobian of the inverse transformation to guide the search in the latent space.
arXiv Detail & Related papers (2020-12-15T23:59:49Z)
Policy Optimization as Online Learning with Mediator Feedback [46.845765216238135]
Policy Optimization (PO) is a widely used approach to address continuous control tasks. In this paper, we introduce the notion of mediator feedback that frames PO as an online learning problem over the policy space. We propose an algorithm, RANDomized-exploration policy Optimization via Multiple Importance Sampling with Truncation (RIST) for regret minimization.
arXiv Detail & Related papers (2020-12-15T11:34:29Z)
A Study of Policy Gradient on a Class of Exactly Solvable Models [35.90565839381652]
We explore the evolution of the policy parameters, for a special class of exactly solvable POMDPs, as a continuous-state Markov chain. Our approach relies heavily on random walk theory, specifically on affine Weyl groups. We analyze the probabilistic convergence of policy gradient to different local maxima of the value function.
arXiv Detail & Related papers (2020-11-03T17:27:53Z)
Continuous Action Reinforcement Learning from a Mixture of Interpretable Experts [35.80418547105711]
We propose a policy scheme that retains a complex function approxor for its internal value predictions but constrains the policy to have a concise, hierarchical, and human-readable structure. The main technical contribution of the paper is to address the challenges introduced by this non-differentiable state selection procedure.
arXiv Detail & Related papers (2020-06-10T16:02:08Z)
Variational Policy Propagation for Multi-agent Reinforcement Learning [68.26579560607597]
We propose a emphcollaborative multi-agent reinforcement learning algorithm named variational policy propagation (VPP) to learn a emphjoint policy through the interactions over agents. We prove that the joint policy is a Markov Random Field under some mild conditions, which in turn reduces the policy space effectively. We integrate the variational inference as special differentiable layers in policy such as the actions can be efficiently sampled from the Markov Random Field and the overall policy is differentiable.
arXiv Detail & Related papers (2020-04-19T15:42:55Z)
Deep Reinforcement Learning with Robust and Smooth Policy [90.78795857181727]
We propose to learn a smooth policy that behaves smoothly with respect to states. We develop a new framework -- textbfSmooth textbfRegularized textbfReinforcement textbfLearning ($textbfSR2textbfL$), where the policy is trained with smoothness-inducing regularization. Such regularization effectively constrains the search space, and enforces smoothness in the learned policy.
arXiv Detail & Related papers (2020-03-21T00:10:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.