Meta-Reinforcement Learning by Tracking Task Non-stationarity
- URL: http://arxiv.org/abs/2105.08834v1
- Date: Tue, 18 May 2021 21:19:41 GMT
- Title: Meta-Reinforcement Learning by Tracking Task Non-stationarity
- Authors: Riccardo Poiani, Andrea Tirinzoni, Marcello Restelli
- Abstract summary: We propose a novel algorithm (TRIO) that optimize for the future by explicitly tracking the task evolution through time.
Unlike most existing methods, TRIO does not assume Markovian task-evolution processes.
We evaluate our algorithm on different simulated problems and show it outperforms competitive baselines.
- Score: 45.90345116853823
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many real-world domains are subject to a structured non-stationarity which
affects the agent's goals and the environmental dynamics. Meta-reinforcement
learning (RL) has been shown successful for training agents that quickly adapt
to related tasks. However, most of the existing meta-RL algorithms for
non-stationary domains either make strong assumptions on the task generation
process or require sampling from it at training time. In this paper, we propose
a novel algorithm (TRIO) that optimizes for the future by explicitly tracking
the task evolution through time. At training time, TRIO learns a variational
module to quickly identify latent parameters from experience samples. This
module is learned jointly with an optimal exploration policy that takes task
uncertainty into account. At test time, TRIO tracks the evolution of the latent
parameters online, hence reducing the uncertainty over future tasks and
obtaining fast adaptation through the meta-learned policy. Unlike most existing
methods, TRIO does not assume Markovian task-evolution processes, it does not
require information about the non-stationarity at training time, and it
captures complex changes undergoing in the environment. We evaluate our
algorithm on different simulated problems and show it outperforms competitive
baselines.
Related papers
- Efficient Meta Reinforcement Learning for Preference-based Fast
Adaptation [17.165083095799712]
We study the problem of few-shot adaptation in the context of human-in-the-loop reinforcement learning.
We develop a meta-RL algorithm that enables fast policy adaptation with preference-based feedback.
arXiv Detail & Related papers (2022-11-20T03:55:09Z) - On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning [71.55412580325743]
We show that multi-task pretraining with fine-tuning on new tasks performs equally as well, or better, than meta-pretraining with meta test-time adaptation.
This is encouraging for future research, as multi-task pretraining tends to be simpler and computationally cheaper than meta-RL.
arXiv Detail & Related papers (2022-06-07T13:24:00Z) - Fully Online Meta-Learning Without Task Boundaries [80.09124768759564]
We study how meta-learning can be applied to tackle online problems of this nature.
We propose a Fully Online Meta-Learning (FOML) algorithm, which does not require any ground truth knowledge about the task boundaries.
Our experiments show that FOML was able to learn new tasks faster than the state-of-the-art online learning methods.
arXiv Detail & Related papers (2022-02-01T07:51:24Z) - Dynamic Regret Analysis for Online Meta-Learning [0.0]
The online meta-learning framework has arisen as a powerful tool for the continual lifelong learning setting.
This formulation involves two levels: outer level which learns meta-learners and inner level which learns task-specific models.
We establish performance in terms of dynamic regret which handles changing environments from a global prospective.
We carry out our analyses in a setting, and in expectation prove a logarithmic local dynamic regret which explicitly depends on the total number of iterations.
arXiv Detail & Related papers (2021-09-29T12:12:59Z) - Meta-Reinforcement Learning in Broad and Non-Parametric Environments [8.091658684517103]
We introduce TIGR, a Task-Inference-based meta-RL algorithm for tasks in non-parametric environments.
We decouple the policy training from the task-inference learning and efficiently train the inference mechanism on the basis of an unsupervised reconstruction objective.
We provide a benchmark with qualitatively distinct tasks based on the half-cheetah environment and demonstrate the superior performance of TIGR compared to state-of-the-art meta-RL approaches.
arXiv Detail & Related papers (2021-08-08T19:32:44Z) - Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z) - Learning Adaptive Exploration Strategies in Dynamic Environments Through
Informed Policy Regularization [100.72335252255989]
We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments.
We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task.
arXiv Detail & Related papers (2020-05-06T16:14:48Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.