Meta-Model-Based Meta-Policy Optimization
- URL: http://arxiv.org/abs/2006.02608v5
- Date: Mon, 11 Oct 2021 11:59:10 GMT
- Title: Meta-Model-Based Meta-Policy Optimization
- Authors: Takuya Hiraoka, Takahisa Imagawa, Voot Tangkaratt, Takayuki Osa,
Takashi Onishi, Yoshimasa Tsuruoka
- Abstract summary: We propose a model-based meta-reinforcement learning (RL) method with a performance guarantee.
We demonstrate that M3PO outperforms existing meta-RL methods in continuous-control benchmarks.
- Score: 19.468989399627638
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model-based meta-reinforcement learning (RL) methods have recently been shown
to be a promising approach to improving the sample efficiency of RL in
multi-task settings. However, the theoretical understanding of those methods is
yet to be established, and there is currently no theoretical guarantee of their
performance in a real-world environment. In this paper, we analyze the
performance guarantee of model-based meta-RL methods by extending the theorems
proposed by Janner et al. (2019). On the basis of our theoretical results, we
propose Meta-Model-Based Meta-Policy Optimization (M3PO), a model-based meta-RL
method with a performance guarantee. We demonstrate that M3PO outperforms
existing meta-RL methods in continuous-control benchmarks.
Related papers
- Meta-Reinforcement Learning with Universal Policy Adaptation: Provable Near-Optimality under All-task Optimum Comparator [9.900800253949512]
We develop a bilevel optimization framework for meta-RL (BO-MRL) to learn the meta-prior for task-specific policy adaptation.
We empirically validate the correctness of the derived upper bounds and demonstrate the superior effectiveness of the proposed algorithm over benchmarks.
arXiv Detail & Related papers (2024-10-13T05:17:58Z) - Heuristic Algorithm-based Action Masking Reinforcement Learning (HAAM-RL) with Ensemble Inference Method [0.0]
This paper presents a novel reinforcement learning approach called HAAMRL (Heuristic ensemble-based Action Masking Reinforcement Learning)
The proposed approach exhibits superior performance and capability generalization, indicating superior effectiveness in optimizing complex manufacturing processes.
arXiv Detail & Related papers (2024-03-21T03:42:39Z) - MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning [18.82398325614491]
We propose a new model-based approach to meta-RL, based on elements from existing state-of-the-art model-based and meta-RL methods.
We demonstrate the effectiveness of our approach on common meta-RL benchmark domains, attaining greater return with better sample efficiency.
In addition, we validate our approach on a slate of more challenging, higher-dimensional domains, taking a step towards real-world generalizing agents.
arXiv Detail & Related papers (2024-03-14T20:40:36Z) - Theoretically Guaranteed Policy Improvement Distilled from Model-Based
Planning [64.10794426777493]
Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks.
Recent practices tend to distill optimized action sequences into an RL policy during the training phase.
We develop an approach to distill from model-based planning to the policy.
arXiv Detail & Related papers (2023-07-24T16:52:31Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning [73.80728148866906]
Quasimetric Reinforcement Learning (QRL) is a new RL method that utilizes quasimetric models to learn optimal value functions.
On offline and online goal-reaching benchmarks, QRL also demonstrates improved sample efficiency and performance.
arXiv Detail & Related papers (2023-04-03T17:59:58Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - On the Convergence Theory of Meta Reinforcement Learning with
Personalized Policies [26.225293232912716]
This paper proposes a novel personalized Meta-RL (pMeta-RL) algorithm.
It aggregates task-specific personalized policies to update a meta-policy used for all tasks, while maintaining personalized policies to maximize the average return of each task.
Experiment results show that the proposed algorithms outperform other previous Meta-RL algorithms on Gym and MuJoCo suites.
arXiv Detail & Related papers (2022-09-21T02:27:56Z) - Simplifying Model-based RL: Learning Representations, Latent-space
Models, and Policies with One Objective [142.36200080384145]
We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent.
We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
arXiv Detail & Related papers (2022-09-18T03:51:58Z) - Model-Based Offline Meta-Reinforcement Learning with Regularization [63.35040401948943]
offline Meta-RL is emerging as a promising approach to address these challenges.
MerPO learns a meta-model for efficient task structure inference and an informative meta-policy.
We show that MerPO offers guaranteed improvement over both the behavior policy and the meta-policy.
arXiv Detail & Related papers (2022-02-07T04:15:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.