On the Convergence Theory of Meta Reinforcement Learning with
Personalized Policies
- URL: http://arxiv.org/abs/2209.10072v1
- Date: Wed, 21 Sep 2022 02:27:56 GMT
- Title: On the Convergence Theory of Meta Reinforcement Learning with
Personalized Policies
- Authors: Haozhi Wang, Qing Wang, Yunfeng Shao, Dong Li, Jianye Hao, Yinchuan Li
- Abstract summary: This paper proposes a novel personalized Meta-RL (pMeta-RL) algorithm.
It aggregates task-specific personalized policies to update a meta-policy used for all tasks, while maintaining personalized policies to maximize the average return of each task.
Experiment results show that the proposed algorithms outperform other previous Meta-RL algorithms on Gym and MuJoCo suites.
- Score: 26.225293232912716
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern meta-reinforcement learning (Meta-RL) methods are mainly developed
based on model-agnostic meta-learning, which performs policy gradient steps
across tasks to maximize policy performance. However, the gradient conflict
problem is still poorly understood in Meta-RL, which may lead to performance
degradation when encountering distinct tasks. To tackle this challenge, this
paper proposes a novel personalized Meta-RL (pMeta-RL) algorithm, which
aggregates task-specific personalized policies to update a meta-policy used for
all tasks, while maintaining personalized policies to maximize the average
return of each task under the constraint of the meta-policy. We also provide
the theoretical analysis under the tabular setting, which demonstrates the
convergence of our pMeta-RL algorithm. Moreover, we extend the proposed
pMeta-RL algorithm to a deep network version based on soft actor-critic, making
it suitable for continuous control tasks. Experiment results show that the
proposed algorithms outperform other previous Meta-RL algorithms on Gym and
MuJoCo suites.
Related papers
- Meta-Reinforcement Learning with Universal Policy Adaptation: Provable Near-Optimality under All-task Optimum Comparator [9.900800253949512]
We develop a bilevel optimization framework for meta-RL (BO-MRL) to learn the meta-prior for task-specific policy adaptation.
We empirically validate the correctness of the derived upper bounds and demonstrate the superior effectiveness of the proposed algorithm over benchmarks.
arXiv Detail & Related papers (2024-10-13T05:17:58Z) - Data-Efficient Task Generalization via Probabilistic Model-based Meta
Reinforcement Learning [58.575939354953526]
PACOH-RL is a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics.
Existing Meta-RL methods require abundant meta-learning data, limiting their applicability in settings such as robotics.
Our experiment results demonstrate that PACOH-RL outperforms model-based RL and model-based Meta-RL baselines in adapting to new dynamic conditions.
arXiv Detail & Related papers (2023-11-13T18:51:57Z) - Meta Generative Flow Networks with Personalization for Task-Specific
Adaptation [8.830531142309733]
Multi-task reinforcement learning and meta-reinforcement learning tend to focus on tasks with higher rewards and more frequent occurrences.
GFlowNets can be integrated into meta-learning algorithms (GFlowMeta) by leveraging the advantages of GFlowNets on tasks with sparse rewards.
This paper proposes a personalized approach named pGFlowMeta, which combines task-specific personalized policies with a meta policy.
arXiv Detail & Related papers (2023-06-16T10:18:38Z) - Train Hard, Fight Easy: Robust Meta Reinforcement Learning [78.16589993684698]
A major challenge of reinforcement learning (RL) in real-world applications is the variation between environments, tasks or clients.
Standard MRL methods optimize the average return over tasks, but often suffer from poor results in tasks of high risk or difficulty.
In this work, we define a robust MRL objective with a controlled level.
The data inefficiency is addressed via the novel Robust Meta RL algorithm (RoML)
arXiv Detail & Related papers (2023-01-26T14:54:39Z) - A Survey of Meta-Reinforcement Learning [69.76165430793571]
We cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL.
We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task.
We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
arXiv Detail & Related papers (2023-01-19T12:01:41Z) - Learning Action Translator for Meta Reinforcement Learning on
Sparse-Reward Tasks [56.63855534940827]
This work introduces a novel objective function to learn an action translator among training tasks.
We theoretically verify that the value of the transferred policy with the action translator can be close to the value of the source policy.
We propose to combine the action translator with context-based meta-RL algorithms for better data collection and more efficient exploration during meta-training.
arXiv Detail & Related papers (2022-07-19T04:58:06Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Model-Based Offline Meta-Reinforcement Learning with Regularization [63.35040401948943]
offline Meta-RL is emerging as a promising approach to address these challenges.
MerPO learns a meta-model for efficient task structure inference and an informative meta-policy.
We show that MerPO offers guaranteed improvement over both the behavior policy and the meta-policy.
arXiv Detail & Related papers (2022-02-07T04:15:20Z) - Curriculum in Gradient-Based Meta-Reinforcement Learning [10.447238563837173]
We show that gradient-based meta-learners are sensitive to task distributions.
With the wrong curriculum, agents suffer the effects of meta-overfitting, shallow adaptation, and adaptation instability.
arXiv Detail & Related papers (2020-02-19T01:40:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.