On First-Order Meta-Reinforcement Learning with Moreau Envelopes
- URL: http://arxiv.org/abs/2305.12216v1
- Date: Sat, 20 May 2023 15:46:55 GMT
- Title: On First-Order Meta-Reinforcement Learning with Moreau Envelopes
- Authors: Mohammad Taha Toghani, Sebastian Perez-Salazar, C\'esar A. Uribe
- Abstract summary: Meta-Reinforcement Learning (MRL) is a promising framework for training agents that can quickly adapt to new environments tasks.
We propose a novel Moreau envelope surrogate regularizers that jointly learn meta-Reinforcement Learning (MEMRL)
We show the effectiveness of MEMRL on a gradient-based multi-task-navigation problem.
- Score: 1.519321208145928
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Meta-Reinforcement Learning (MRL) is a promising framework for training
agents that can quickly adapt to new environments and tasks. In this work, we
study the MRL problem under the policy gradient formulation, where we propose a
novel algorithm that uses Moreau envelope surrogate regularizers to jointly
learn a meta-policy that is adjustable to the environment of each individual
task. Our algorithm, called Moreau Envelope Meta-Reinforcement Learning
(MEMRL), learns a meta-policy that can adapt to a distribution of tasks by
efficiently updating the policy parameters using a combination of
gradient-based optimization and Moreau Envelope regularization. Moreau
Envelopes provide a smooth approximation of the policy optimization problem,
which enables us to apply standard optimization techniques and converge to an
appropriate stationary point. We provide a detailed analysis of the MEMRL
algorithm, where we show a sublinear convergence rate to a first-order
stationary point for non-convex policy gradient optimization. We finally show
the effectiveness of MEMRL on a multi-task 2D-navigation problem.
Related papers
- Meta-Reinforcement Learning with Universal Policy Adaptation: Provable Near-Optimality under All-task Optimum Comparator [9.900800253949512]
We develop a bilevel optimization framework for meta-RL (BO-MRL) to learn the meta-prior for task-specific policy adaptation.
We empirically validate the correctness of the derived upper bounds and demonstrate the superior effectiveness of the proposed algorithm over benchmarks.
arXiv Detail & Related papers (2024-10-13T05:17:58Z) - HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning [72.25707314772254]
We introduce the Harmony Multi-Task Decision Transformer (HarmoDT), a novel solution designed to identify an optimal harmony subspace of parameters for each task.
The upper level of this framework is dedicated to learning a task-specific mask that delineates the harmony subspace, while the inner level focuses on updating parameters to enhance the overall performance of the unified policy.
arXiv Detail & Related papers (2024-05-28T11:41:41Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Acceleration in Policy Optimization [50.323182853069184]
We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates.
We define optimism as predictive modelling of the future behavior of a policy, and adaptivity as taking immediate and anticipatory corrective actions to mitigate errors from overshooting predictions or delayed responses to change.
We design an optimistic policy gradient algorithm, adaptive via meta-gradient learning, and empirically highlight several design choices pertaining to acceleration, in an illustrative task.
arXiv Detail & Related papers (2023-06-18T15:50:57Z) - Local Optimization Achieves Global Optimality in Multi-Agent
Reinforcement Learning [139.53668999720605]
We present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO.
We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2023-05-08T16:20:03Z) - On the Convergence Theory of Meta Reinforcement Learning with
Personalized Policies [26.225293232912716]
This paper proposes a novel personalized Meta-RL (pMeta-RL) algorithm.
It aggregates task-specific personalized policies to update a meta-policy used for all tasks, while maintaining personalized policies to maximize the average return of each task.
Experiment results show that the proposed algorithms outperform other previous Meta-RL algorithms on Gym and MuJoCo suites.
arXiv Detail & Related papers (2022-09-21T02:27:56Z) - Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective
Reinforcement Learning [17.916366827429034]
We study policy optimization for Markov decision processes (MDPs) with multiple reward value functions.
We propose an Anchor-changing Regularized Natural Policy Gradient framework, which can incorporate ideas from well-performing first-order methods.
arXiv Detail & Related papers (2022-06-10T21:09:44Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z) - Improving Actor-Critic Reinforcement Learning via Hamiltonian Policy [11.34520632697191]
Approximating optimal policies in reinforcement learning (RL) is often necessary in many real-world scenarios.
In this work, inspired by the previous use of Hamiltonian Monte Carlo (HMC) in VI, we propose to integrate policy optimization with HMC.
We show that the proposed approach is a data-efficient, and an easy-to-implement improvement over previous policy optimization methods.
arXiv Detail & Related papers (2021-03-22T17:26:43Z) - Near Optimal Policy Optimization via REPS [33.992374484681704]
emphrelative entropy policy search (REPS) has demonstrated successful policy learning on a number of simulated and real-world robotic domains.
There exist no guarantees on REPS's performance when using gradient-based solvers.
We introduce a technique that uses emphgenerative access to the underlying decision process to compute parameter updates that maintain favorable convergence to the optimal regularized policy.
arXiv Detail & Related papers (2021-03-17T16:22:59Z) - Meta-Learning with Neural Tangent Kernels [58.06951624702086]
We propose the first meta-learning paradigm in the Reproducing Kernel Hilbert Space (RKHS) induced by the meta-model's Neural Tangent Kernel (NTK)
Within this paradigm, we introduce two meta-learning algorithms, which no longer need a sub-optimal iterative inner-loop adaptation as in the MAML framework.
We achieve this goal by 1) replacing the adaptation with a fast-adaptive regularizer in the RKHS; and 2) solving the adaptation analytically based on the NTK theory.
arXiv Detail & Related papers (2021-02-07T20:53:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.