Lifelong Policy Gradient Learning of Factored Policies for Faster
Training Without Forgetting
- URL: http://arxiv.org/abs/2007.07011v2
- Date: Wed, 21 Oct 2020 20:36:14 GMT
- Title: Lifelong Policy Gradient Learning of Factored Policies for Faster
Training Without Forgetting
- Authors: Jorge A. Mendez and Boyu Wang and Eric Eaton
- Abstract summary: We provide a novel method for lifelong policy gradient learning that trains lifelong function approximators directly via policy gradients.
We show empirically that our algorithm learns faster and converges to better policies than single-task and lifelong learning baselines.
- Score: 26.13332231423652
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Policy gradient methods have shown success in learning control policies for
high-dimensional dynamical systems. Their biggest downside is the amount of
exploration they require before yielding high-performing policies. In a
lifelong learning setting, in which an agent is faced with multiple consecutive
tasks over its lifetime, reusing information from previously seen tasks can
substantially accelerate the learning of new tasks. We provide a novel method
for lifelong policy gradient learning that trains lifelong function
approximators directly via policy gradients, allowing the agent to benefit from
accumulated knowledge throughout the entire training process. We show
empirically that our algorithm learns faster and converges to better policies
than single-task and lifelong learning baselines, and completely avoids
catastrophic forgetting on a variety of challenging domains.
Related papers
- IOB: Integrating Optimization Transfer and Behavior Transfer for
Multi-Policy Reuse [50.90781542323258]
Reinforcement learning (RL) agents can transfer knowledge from source policies to a related target task.
Previous methods introduce additional components, such as hierarchical policies or estimations of source policies' value functions.
We propose a novel transfer RL method that selects the source policy without training extra components.
arXiv Detail & Related papers (2023-08-14T09:22:35Z) - Reactive Exploration to Cope with Non-Stationarity in Lifelong
Reinforcement Learning [4.489095027077955]
We propose Reactive Exploration to track and react to continual domain shifts in lifelong reinforcement learning.
We empirically show that representatives of the policy-gradient family are better suited for lifelong learning, as they adapt more quickly to distribution shifts than Q-learning.
arXiv Detail & Related papers (2022-07-12T17:59:00Z) - TempoRL: Temporal Priors for Exploration in Off-Policy Reinforcement
Learning [33.512849582347734]
We propose to learn features from offline data that are shared by a more diverse range of tasks.
We introduce state-independent temporal priors, which directly model temporal consistency in demonstrated trajectories.
We also introduce a novel integration scheme for action priors in off-policy reinforcement learning.
arXiv Detail & Related papers (2022-05-26T17:49:12Z) - SAFER: Data-Efficient and Safe Reinforcement Learning via Skill
Acquisition [59.94644674087599]
We propose SAFEty skill pRiors (SAFER), an algorithm that accelerates policy learning on complex control tasks under safety constraints.
Through principled training on an offline dataset, SAFER learns to extract safe primitive skills.
In the inference stage, policies trained with SAFER learn to compose safe skills into successful policies.
arXiv Detail & Related papers (2022-02-10T05:43:41Z) - Goal-Conditioned Reinforcement Learning with Imagined Subgoals [89.67840168694259]
We propose to incorporate imagined subgoals into policy learning to facilitate learning of complex tasks.
Imagined subgoals are predicted by a separate high-level policy, which is trained simultaneously with the policy and its critic.
We evaluate our approach on complex robotic navigation and manipulation tasks and show that it outperforms existing methods by a large margin.
arXiv Detail & Related papers (2021-07-01T15:30:59Z) - Simplifying Deep Reinforcement Learning via Self-Supervision [51.2400839966489]
Self-Supervised Reinforcement Learning (SSRL) is a simple algorithm that optimize policies with purely supervised losses.
We show that SSRL is surprisingly competitive to contemporary algorithms with more stable performance and less running time.
arXiv Detail & Related papers (2021-06-10T06:29:59Z) - Lifetime policy reuse and the importance of task capacity [6.390849000337326]
Policy reuse and other multi-policy reinforcement learning techniques can learn multiple tasks but may generate many policies.
This paper presents two novel contributions, namely 1) Lifetime Policy Reuse, a model-agnostic policy reuse algorithm that avoids generating many policies.
The results demonstrate the importance of Lifetime Policy Reuse and task capacity based pre-selection on an 18-task partially observable Pacman domain and a Cartpole domain of up to 125 tasks.
arXiv Detail & Related papers (2021-06-03T10:42:49Z) - Learning Adaptive Exploration Strategies in Dynamic Environments Through
Informed Policy Regularization [100.72335252255989]
We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments.
We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task.
arXiv Detail & Related papers (2020-05-06T16:14:48Z) - Efficient Deep Reinforcement Learning via Adaptive Policy Transfer [50.51637231309424]
Policy Transfer Framework (PTF) is proposed to accelerate Reinforcement Learning (RL)
Our framework learns when and which source policy is the best to reuse for the target policy and when to terminate it.
Experimental results show it significantly accelerates the learning process and surpasses state-of-the-art policy transfer methods.
arXiv Detail & Related papers (2020-02-19T07:30:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.