Related papers: Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting

Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting

URL: http://arxiv.org/abs/2007.07011v2
Date: Wed, 21 Oct 2020 20:36:14 GMT
Title: Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting
Authors: Jorge A. Mendez and Boyu Wang and Eric Eaton
Abstract summary: We provide a novel method for lifelong policy gradient learning that trains lifelong function approximators directly via policy gradients. We show empirically that our algorithm learns faster and converges to better policies than single-task and lifelong learning baselines.
Score: 26.13332231423652
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Policy gradient methods have shown success in learning control policies for high-dimensional dynamical systems. Their biggest downside is the amount of exploration they require before yielding high-performing policies. In a lifelong learning setting, in which an agent is faced with multiple consecutive tasks over its lifetime, reusing information from previously seen tasks can substantially accelerate the learning of new tasks. We provide a novel method for lifelong policy gradient learning that trains lifelong function approximators directly via policy gradients, allowing the agent to benefit from accumulated knowledge throughout the entire training process. We show empirically that our algorithm learns faster and converges to better policies than single-task and lifelong learning baselines, and completely avoids catastrophic forgetting on a variety of challenging domains.

Related papers

Iterative Batch Reinforcement Learning via Safe Diversified Model-based Policy Search [2.0072624123275533]
Batch reinforcement learning enables policy learning without direct interaction with the environment during training. This approach is well-suited for high-risk and cost-intensive applications, such as industrial control. We present an algorithmic methodology for iterative batch reinforcement learning based on ensemble-based model-based policy search.
arXiv Detail & Related papers (2024-11-14T11:10:36Z)
IOB: Integrating Optimization Transfer and Behavior Transfer for Multi-Policy Reuse [50.90781542323258]
Reinforcement learning (RL) agents can transfer knowledge from source policies to a related target task. Previous methods introduce additional components, such as hierarchical policies or estimations of source policies' value functions. We propose a novel transfer RL method that selects the source policy without training extra components.
arXiv Detail & Related papers (2023-08-14T09:22:35Z)
Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning [4.489095027077955]
We propose Reactive Exploration to track and react to continual domain shifts in lifelong reinforcement learning. We empirically show that representatives of the policy-gradient family are better suited for lifelong learning, as they adapt more quickly to distribution shifts than Q-learning.
arXiv Detail & Related papers (2022-07-12T17:59:00Z)
TempoRL: Temporal Priors for Exploration in Off-Policy Reinforcement Learning [33.512849582347734]
We propose to learn features from offline data that are shared by a more diverse range of tasks. We introduce state-independent temporal priors, which directly model temporal consistency in demonstrated trajectories. We also introduce a novel integration scheme for action priors in off-policy reinforcement learning.
arXiv Detail & Related papers (2022-05-26T17:49:12Z)
SAFER: Data-Efficient and Safe Reinforcement Learning via Skill Acquisition [59.94644674087599]
We propose SAFEty skill pRiors (SAFER), an algorithm that accelerates policy learning on complex control tasks under safety constraints. Through principled training on an offline dataset, SAFER learns to extract safe primitive skills. In the inference stage, policies trained with SAFER learn to compose safe skills into successful policies.
arXiv Detail & Related papers (2022-02-10T05:43:41Z)
Goal-Conditioned Reinforcement Learning with Imagined Subgoals [89.67840168694259]
We propose to incorporate imagined subgoals into policy learning to facilitate learning of complex tasks. Imagined subgoals are predicted by a separate high-level policy, which is trained simultaneously with the policy and its critic. We evaluate our approach on complex robotic navigation and manipulation tasks and show that it outperforms existing methods by a large margin.
arXiv Detail & Related papers (2021-07-01T15:30:59Z)
Simplifying Deep Reinforcement Learning via Self-Supervision [51.2400839966489]
Self-Supervised Reinforcement Learning (SSRL) is a simple algorithm that optimize policies with purely supervised losses. We show that SSRL is surprisingly competitive to contemporary algorithms with more stable performance and less running time.
arXiv Detail & Related papers (2021-06-10T06:29:59Z)
Lifetime policy reuse and the importance of task capacity [6.390849000337326]
Policy reuse and other multi-policy reinforcement learning techniques can learn multiple tasks but may generate many policies. This paper presents two novel contributions, namely 1) Lifetime Policy Reuse, a model-agnostic policy reuse algorithm that avoids generating many policies. The results demonstrate the importance of Lifetime Policy Reuse and task capacity based pre-selection on an 18-task partially observable Pacman domain and a Cartpole domain of up to 125 tasks.
arXiv Detail & Related papers (2021-06-03T10:42:49Z)
Learning Adaptive Exploration Strategies in Dynamic Environments Through Informed Policy Regularization [100.72335252255989]
We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments. We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task.
arXiv Detail & Related papers (2020-05-06T16:14:48Z)
Efficient Deep Reinforcement Learning via Adaptive Policy Transfer [50.51637231309424]
Policy Transfer Framework (PTF) is proposed to accelerate Reinforcement Learning (RL) Our framework learns when and which source policy is the best to reuse for the target policy and when to terminate it. Experimental results show it significantly accelerates the learning process and surpasses state-of-the-art policy transfer methods.
arXiv Detail & Related papers (2020-02-19T07:30:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.