Double Meta-Learning for Data Efficient Policy Optimization in
Non-Stationary Environments
- URL: http://arxiv.org/abs/2011.10714v1
- Date: Sat, 21 Nov 2020 03:19:35 GMT
- Title: Double Meta-Learning for Data Efficient Policy Optimization in
Non-Stationary Environments
- Authors: Elahe Aghapour, Nora Ayanian
- Abstract summary: We are interested in learning models of non-stationary environments, which can be framed as a multi-task learning problem.
Model-free reinforcement learning algorithms can achieve good performance in multi-task learning at a cost of extensive sampling.
While model-based approaches are among the most data efficient learning algorithms, they still struggle with complex tasks and model uncertainties.
- Score: 12.45281856559346
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We are interested in learning models of non-stationary environments, which
can be framed as a multi-task learning problem. Model-free reinforcement
learning algorithms can achieve good asymptotic performance in multi-task
learning at a cost of extensive sampling, due to their approach, which requires
learning from scratch. While model-based approaches are among the most data
efficient learning algorithms, they still struggle with complex tasks and model
uncertainties. Meta-reinforcement learning addresses the efficiency and
generalization challenges on multi task learning by quickly leveraging the
meta-prior policy for a new task. In this paper, we propose a
meta-reinforcement learning approach to learn the dynamic model of a
non-stationary environment to be used for meta-policy optimization later. Due
to the sample efficiency of model-based learning methods, we are able to
simultaneously train both the meta-model of the non-stationary environment and
the meta-policy until dynamic model convergence. Then, the meta-learned dynamic
model of the environment will generate simulated data for meta-policy
optimization. Our experiment demonstrates that our proposed method can
meta-learn the policy in a non-stationary environment with the data efficiency
of model-based learning approaches while achieving the high asymptotic
performance of model-free meta-reinforcement learning.
Related papers
- Model-based Policy Optimization using Symbolic World Model [46.42871544295734]
The application of learning-based control methods in robotics presents significant challenges.
One is that model-free reinforcement learning algorithms use observation data with low sample efficiency.
We suggest approximating transition dynamics with symbolic expressions, which are generated via symbolic regression.
arXiv Detail & Related papers (2024-07-18T13:49:21Z) - Learning to Unlearn for Robust Machine Unlearning [6.488418950340473]
We introduce a novel Learning-to-Unlearn (LTU) framework to optimize the unlearning process.
LTU includes a meta-optimization scheme that facilitates models to effectively preserve generalizable knowledge.
We also introduce a Gradient Harmonization strategy to align the optimization trajectories for remembering and forgetting.
arXiv Detail & Related papers (2024-07-15T07:36:00Z) - Data-Efficient Task Generalization via Probabilistic Model-based Meta
Reinforcement Learning [58.575939354953526]
PACOH-RL is a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics.
Existing Meta-RL methods require abundant meta-learning data, limiting their applicability in settings such as robotics.
Our experiment results demonstrate that PACOH-RL outperforms model-based RL and model-based Meta-RL baselines in adapting to new dynamic conditions.
arXiv Detail & Related papers (2023-11-13T18:51:57Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - Learning to Continuously Optimize Wireless Resource in a Dynamic
Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment.
We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes.
Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z) - Model-based Meta Reinforcement Learning using Graph Structured Surrogate
Models [40.08137765886609]
We show that our model, called a graph structured surrogate model (GSSM), outperforms state-of-the-art methods in predicting environment dynamics.
Our approach is able to obtain high returns, while allowing fast execution during deployment by avoiding test time policy gradient optimization.
arXiv Detail & Related papers (2021-02-16T17:21:55Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z) - Model-based Adversarial Meta-Reinforcement Learning [38.28304764312512]
We propose Model-based Adversarial Meta-Reinforcement Learning (AdMRL)
AdMRL aims to minimize the worst-case sub-optimality gap across all tasks in a family of tasks.
We evaluate our approach on several continuous control benchmarks and demonstrate its efficacy in the worst-case performance over all tasks.
arXiv Detail & Related papers (2020-06-16T02:21:49Z) - Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z) - Model-Augmented Actor-Critic: Backpropagating through Paths [81.86992776864729]
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator.
We show how to make more effective use of the model by exploiting its differentiability.
arXiv Detail & Related papers (2020-05-16T19:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.