Exploring Novel Quality Diversity Methods For Generalization in
Reinforcement Learning
- URL: http://arxiv.org/abs/2303.14592v1
- Date: Sun, 26 Mar 2023 00:23:29 GMT
- Title: Exploring Novel Quality Diversity Methods For Generalization in
Reinforcement Learning
- Authors: Brad Windsor, Brandon O'Shea, Mengxi Wu
- Abstract summary: The Reinforcement Learning field is strong on achievements and weak on reapplication.
This paper asks whether the method of training networks improves their generalization.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Reinforcement Learning field is strong on achievements and weak on
reapplication; a computer playing GO at a super-human level is still terrible
at Tic-Tac-Toe. This paper asks whether the method of training networks
improves their generalization. Specifically we explore core quality diversity
algorithms, compare against two recent algorithms, and propose a new algorithm
to deal with shortcomings in existing methods. Although results of these
methods are well below the performance hoped for, our work raises important
points about the choice of behavior criterion in quality diversity, the
interaction of differential and evolutionary training methods, and the role of
offline reinforcement learning and randomized learning in evolutionary search.
Related papers
- RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Diverse Policies Converge in Reward-free Markov Decision Processe [19.42193141047252]
We provide a unified diversity reinforcement learning framework and investigate the convergence of training diverse policies.
Under such a framework, we also propose a provably efficient diversity reinforcement learning algorithm.
arXiv Detail & Related papers (2023-08-23T05:17:51Z) - Evolutionary Strategy Guided Reinforcement Learning via MultiBuffer
Communication [0.0]
We introduce a new Evolutionary Reinforcement Learning model which combines a particular family of Evolutionary algorithm called Evolutionary Strategies with the off-policy Deep Reinforcement Learning algorithm TD3.
The proposed algorithm is demonstrated to perform competitively with current Evolutionary Reinforcement Learning algorithms on MuJoCo control tasks.
arXiv Detail & Related papers (2023-06-20T13:41:57Z) - Neuroevolution is a Competitive Alternative to Reinforcement Learning
for Skill Discovery [12.586875201983778]
Deep Reinforcement Learning (RL) has emerged as a powerful paradigm for training neural policies to solve complex control tasks.
We show that Quality Diversity (QD) methods are a competitive alternative to information-theory-augmented RL for skill discovery.
arXiv Detail & Related papers (2022-10-06T11:06:39Z) - Learning Dynamics and Generalization in Reinforcement Learning [59.530058000689884]
We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training.
We show that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly networks and gradient networks trained with policy methods.
arXiv Detail & Related papers (2022-06-05T08:49:16Z) - Qualitative Differences Between Evolutionary Strategies and
Reinforcement Learning Methods for Control of Autonomous Agents [0.0]
We focus on two popular state-of-the-art algorithms: the OpenAI-ES evolutionary strategy and the Proximal Policy Optimization (PPO) reinforcement learning algorithm.
We analyze how the methods differ with respect to: (i) general efficacy, (ii) ability to cope with sparse rewards, (iii) propensity/capacity to discover minimal solutions, (iv) dependency on reward shaping, and (v) ability to cope with variations of the environmental conditions.
arXiv Detail & Related papers (2022-05-16T11:51:36Z) - Rethinking Learning Dynamics in RL using Adversarial Networks [79.56118674435844]
We present a learning mechanism for reinforcement learning of closely related skills parameterized via a skill embedding space.
The main contribution of our work is to formulate an adversarial training regime for reinforcement learning with the help of entropy-regularized policy gradient formulation.
arXiv Detail & Related papers (2022-01-27T19:51:09Z) - Behavior-based Neuroevolutionary Training in Reinforcement Learning [3.686320043830301]
This work presents a hybrid algorithm that combines neuroevolutionary optimization with value-based reinforcement learning.
For this purpose, we consolidate different methods to generate and optimize agent policies, creating a diverse population.
Our results indicate that combining methods can enhance the sample efficiency and learning speed for evolutionary approaches.
arXiv Detail & Related papers (2021-05-17T15:40:42Z) - Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms.
The learned algorithms are domain-agnostic and can generalize to new environments not seen during training.
We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z) - Incremental Embedding Learning via Zero-Shot Translation [65.94349068508863]
Current state-of-the-art incremental learning methods tackle catastrophic forgetting problem in traditional classification networks.
We propose a novel class-incremental method for embedding network, named as zero-shot translation class-incremental method (ZSTCI)
In addition, ZSTCI can easily be combined with existing regularization-based incremental learning methods to further improve performance of embedding networks.
arXiv Detail & Related papers (2020-12-31T08:21:37Z) - Importance Weighted Policy Learning and Adaptation [89.46467771037054]
We study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning.
The framework is inspired by ideas from the probabilistic inference literature and combines robust off-policy learning with a behavior prior.
Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.
arXiv Detail & Related papers (2020-09-10T14:16:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.