Related papers: Lifetime policy reuse and the importance of task capacity

Lifetime policy reuse and the importance of task capacity

URL: http://arxiv.org/abs/2106.01741v3
Date: Fri, 20 Oct 2023 14:02:24 GMT
Title: Lifetime policy reuse and the importance of task capacity
Authors: David M. Bossens and Adam J. Sobey
Abstract summary: Policy reuse and other multi-policy reinforcement learning techniques can learn multiple tasks but may generate many policies. This paper presents two novel contributions, namely 1) Lifetime Policy Reuse, a model-agnostic policy reuse algorithm that avoids generating many policies. The results demonstrate the importance of Lifetime Policy Reuse and task capacity based pre-selection on an 18-task partially observable Pacman domain and a Cartpole domain of up to 125 tasks.
Score: 6.390849000337326
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A long-standing challenge in artificial intelligence is lifelong reinforcement learning, where learners are given many tasks in sequence and must transfer knowledge between tasks while avoiding catastrophic forgetting. Policy reuse and other multi-policy reinforcement learning techniques can learn multiple tasks but may generate many policies. This paper presents two novel contributions, namely 1) Lifetime Policy Reuse, a model-agnostic policy reuse algorithm that avoids generating many policies by optimising a fixed number of near-optimal policies through a combination of policy optimisation and adaptive policy selection; and 2) the task capacity, a measure for the maximal number of tasks that a policy can accurately solve. Comparing two state-of-the-art base-learners, the results demonstrate the importance of Lifetime Policy Reuse and task capacity based pre-selection on an 18-task partially observable Pacman domain and a Cartpole domain of up to 125 tasks.

Related papers

Learning Policy Committees for Effective Personalization in MDPs with Diverse Tasks [40.2989900672992]
We propose a novel approach for learning a policy committee that includes at least one near-optimal policy with high probability for tasks encountered during execution. Our experiments on MuJoCo and Meta-World show that the proposed approach outperforms state-of-the-art multi-task, meta-, and task clustering baselines in training, generalization, and few-shot learning.
arXiv Detail & Related papers (2025-02-26T22:45:25Z)
IOB: Integrating Optimization Transfer and Behavior Transfer for Multi-Policy Reuse [50.90781542323258]
Reinforcement learning (RL) agents can transfer knowledge from source policies to a related target task. Previous methods introduce additional components, such as hierarchical policies or estimations of source policies' value functions. We propose a novel transfer RL method that selects the source policy without training extra components.
arXiv Detail & Related papers (2023-08-14T09:22:35Z)
Residual Q-Learning: Offline and Online Policy Customization without Value [53.47311900133564]
Imitation Learning (IL) is a widely used framework for learning imitative behavior from demonstrations. We formulate a new problem setting called policy customization. We propose a novel framework, Residual Q-learning, which can solve the formulated MDP by leveraging the prior policy.
arXiv Detail & Related papers (2023-06-15T22:01:19Z)
On the Value of Myopic Behavior in Policy Reuse [67.37788288093299]
Leveraging learned strategies in unfamiliar scenarios is fundamental to human intelligence. In this work, we present a framework called Selective Myopic bEhavior Control(SMEC) SMEC adaptively aggregates the sharable short-term behaviors of prior policies and the long-term behaviors of the task policy, leading to coordinated decisions.
arXiv Detail & Related papers (2023-05-28T03:59:37Z)
Safety-Constrained Policy Transfer with Successor Features [19.754549649781644]
We propose a Constrained Markov Decision Process (CMDP) formulation that enables the transfer of policies and adherence to safety constraints. Our approach relies on a novel extension of generalized policy improvement to constrained settings via a Lagrangian formulation. Our experiments in simulated domains show that our approach is effective; it visits unsafe states less frequently and outperforms alternative state-of-the-art methods when taking safety constraints into account.
arXiv Detail & Related papers (2022-11-10T06:06:36Z)
Constructing a Good Behavior Basis for Transfer using Generalized Policy Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks. We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z)
Towards an Understanding of Default Policies in Multitask Policy Optimization [29.806071693039655]
Much of the recent success of deep reinforcement learning has been driven by regularized policy optimization (RPO) algorithms. We take a first step towards filling this gap by formally linking the quality of the default policy to its effect on optimization. We then derive a principled RPO algorithm for multitask learning with strong performance guarantees.
arXiv Detail & Related papers (2021-11-04T16:45:15Z)
Goal-Conditioned Reinforcement Learning with Imagined Subgoals [89.67840168694259]
We propose to incorporate imagined subgoals into policy learning to facilitate learning of complex tasks. Imagined subgoals are predicted by a separate high-level policy, which is trained simultaneously with the policy and its critic. We evaluate our approach on complex robotic navigation and manipulation tasks and show that it outperforms existing methods by a large margin.
arXiv Detail & Related papers (2021-07-01T15:30:59Z)
Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting [26.13332231423652]
We provide a novel method for lifelong policy gradient learning that trains lifelong function approximators directly via policy gradients. We show empirically that our algorithm learns faster and converges to better policies than single-task and lifelong learning baselines.
arXiv Detail & Related papers (2020-07-14T13:05:42Z)
Accelerating Safe Reinforcement Learning with Constraint-mismatched Policies [34.555500347840805]
We consider the problem of reinforcement learning when provided with a baseline control policy and a set of constraints that the learner must satisfy. We propose an iterative policy optimization algorithm that alternates between maximizing expected return on the task, minimizing distance to the baseline policy, and projecting the policy onto the constraint-satisfying set. Our algorithm consistently outperforms several state-of-the-art baselines, achieving 10 times fewer constraint violations and 40% higher reward on average.
arXiv Detail & Related papers (2020-06-20T20:20:47Z)
Learning Adaptive Exploration Strategies in Dynamic Environments Through Informed Policy Regularization [100.72335252255989]
We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments. We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task.
arXiv Detail & Related papers (2020-05-06T16:14:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.