Policy Manifold Search for Improving Diversity-based Neuroevolution
- URL: http://arxiv.org/abs/2012.08676v1
- Date: Tue, 15 Dec 2020 23:59:49 GMT
- Title: Policy Manifold Search for Improving Diversity-based Neuroevolution
- Authors: Nemanja Rakicevic, Antoine Cully and Petar Kormushev
- Abstract summary: We propose a novel approach to diversity-based policy search via Neuroevolution.
Our approach iteratively collects policies according to the Quality-Diversity framework.
We use the Jacobian of the inverse transformation to guide the search in the latent space.
- Score: 4.920145245773581
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diversity-based approaches have recently gained popularity as an alternative
paradigm to performance-based policy search. A popular approach from this
family, Quality-Diversity (QD), maintains a collection of high-performing
policies separated in the diversity-metric space, defined based on policies'
rollout behaviours. When policies are parameterised as neural networks, i.e.
Neuroevolution, QD tends to not scale well with parameter space dimensionality.
Our hypothesis is that there exists a low-dimensional manifold embedded in the
policy parameter space, containing a high density of diverse and feasible
policies. We propose a novel approach to diversity-based policy search via
Neuroevolution, that leverages learned latent representations of the policy
parameters which capture the local structure of the data. Our approach
iteratively collects policies according to the QD framework, in order to (i)
build a collection of diverse policies, (ii) use it to learn a latent
representation of the policy parameters, (iii) perform policy search in the
learned latent space. We use the Jacobian of the inverse transformation
(i.e.reconstruction function) to guide the search in the latent space. This
ensures that the generated samples remain in the high-density regions of the
original space, after reconstruction. We evaluate our contributions on three
continuous control tasks in simulated environments, and compare to
diversity-based baselines. The findings suggest that our approach yields a more
efficient and robust policy search process.
Related papers
- Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Last-Iterate Convergent Policy Gradient Primal-Dual Methods for
Constrained MDPs [107.28031292946774]
We study the problem of computing an optimal policy of an infinite-horizon discounted Markov decision process (constrained MDP)
We develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy.
To the best of our knowledge, this work appears to be the first non-asymptotic policy last-iterate convergence result for single-time-scale algorithms in constrained MDPs.
arXiv Detail & Related papers (2023-06-20T17:27:31Z) - Policy Dispersion in Non-Markovian Environment [53.05904889617441]
This paper tries to learn the diverse policies from the history of state-action pairs under a non-Markovian environment.
We first adopt a transformer-based method to learn policy embeddings.
Then, we stack the policy embeddings to construct a dispersion matrix to induce a set of diverse policies.
arXiv Detail & Related papers (2023-02-28T11:58:39Z) - Non-Parametric Stochastic Policy Gradient with Strategic Retreat for
Non-Stationary Environment [1.5229257192293197]
We propose a systematic methodology to learn a sequence of optimal control policies non-parametrically.
Our methodology has outperformed the well-established DDPG and TD3 methodology by a sizeable margin in terms of learning performance.
arXiv Detail & Related papers (2022-03-24T21:41:13Z) - Fast Model-based Policy Search for Universal Policy Networks [45.44896435487879]
Adapting an agent's behaviour to new environments has been one of the primary focus areas of physics based reinforcement learning.
We propose a Gaussian Process-based prior learned in simulation, that captures the likely performance of a policy when transferred to a previously unseen environment.
We integrate this prior with a Bayesian optimisation-based policy search process to improve the efficiency of identifying the most appropriate policy from the universal policy network.
arXiv Detail & Related papers (2022-02-11T18:08:02Z) - Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety
Constraints in Finite MDPs [71.47895794305883]
We study the problem of Safe Policy Improvement (SPI) under constraints in the offline Reinforcement Learning setting.
We present an SPI for this RL setting that takes into account the preferences of the algorithm's user for handling the trade-offs for different reward signals.
arXiv Detail & Related papers (2021-05-31T21:04:21Z) - Policy Manifold Search: Exploring the Manifold Hypothesis for
Diversity-based Neuroevolution [4.920145245773581]
This paper proposes a novel method for diversity-based policy search via Neuroevolution.
We use the Quality-Diversity framework which provides a principled approach to policy search.
We also use the Jacobian of the inverse-mapping function to guide the search in the representation space.
arXiv Detail & Related papers (2021-04-27T18:52:03Z) - A Study of Policy Gradient on a Class of Exactly Solvable Models [35.90565839381652]
We explore the evolution of the policy parameters, for a special class of exactly solvable POMDPs, as a continuous-state Markov chain.
Our approach relies heavily on random walk theory, specifically on affine Weyl groups.
We analyze the probabilistic convergence of policy gradient to different local maxima of the value function.
arXiv Detail & Related papers (2020-11-03T17:27:53Z) - Stable Policy Optimization via Off-Policy Divergence Regularization [50.98542111236381]
Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) are among the most successful policy gradient approaches in deep reinforcement learning (RL)
We propose a new algorithm which stabilizes the policy improvement through a proximity term that constrains the discounted state-action visitation distribution induced by consecutive policies to be close to one another.
Our proposed method can have a beneficial effect on stability and improve final performance in benchmark high-dimensional control tasks.
arXiv Detail & Related papers (2020-03-09T13:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.