Efficient Meta Reinforcement Learning for Preference-based Fast
Adaptation
- URL: http://arxiv.org/abs/2211.10861v1
- Date: Sun, 20 Nov 2022 03:55:09 GMT
- Title: Efficient Meta Reinforcement Learning for Preference-based Fast
Adaptation
- Authors: Zhizhou Ren, Anji Liu, Yitao Liang, Jian Peng, Jianzhu Ma
- Abstract summary: We study the problem of few-shot adaptation in the context of human-in-the-loop reinforcement learning.
We develop a meta-RL algorithm that enables fast policy adaptation with preference-based feedback.
- Score: 17.165083095799712
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning new task-specific skills from a few trials is a fundamental
challenge for artificial intelligence. Meta reinforcement learning (meta-RL)
tackles this problem by learning transferable policies that support few-shot
adaptation to unseen tasks. Despite recent advances in meta-RL, most existing
methods require the access to the environmental reward function of new tasks to
infer the task objective, which is not realistic in many practical
applications. To bridge this gap, we study the problem of few-shot adaptation
in the context of human-in-the-loop reinforcement learning. We develop a
meta-RL algorithm that enables fast policy adaptation with preference-based
feedback. The agent can adapt to new tasks by querying human's preference
between behavior trajectories instead of using per-step numeric rewards. By
extending techniques from information theory, our approach can design query
sequences to maximize the information gain from human interactions while
tolerating the inherent error of non-expert human oracle. In experiments, we
extensively evaluate our method, Adaptation with Noisy OracLE (ANOLE), on a
variety of meta-RL benchmark tasks and demonstrate substantial improvement over
baseline algorithms in terms of both feedback efficiency and error tolerance.
Related papers
- Neuroevolution is a Competitive Alternative to Reinforcement Learning
for Skill Discovery [12.586875201983778]
Deep Reinforcement Learning (RL) has emerged as a powerful paradigm for training neural policies to solve complex control tasks.
We show that Quality Diversity (QD) methods are a competitive alternative to information-theory-augmented RL for skill discovery.
arXiv Detail & Related papers (2022-10-06T11:06:39Z) - Meta Reinforcement Learning with Successor Feature Based Context [51.35452583759734]
We propose a novel meta-RL approach that achieves competitive performance comparing to existing meta-RL algorithms.
Our method does not only learn high-quality policies for multiple tasks simultaneously but also can quickly adapt to new tasks with a small amount of training.
arXiv Detail & Related papers (2022-07-29T14:52:47Z) - Learning Action Translator for Meta Reinforcement Learning on
Sparse-Reward Tasks [56.63855534940827]
This work introduces a novel objective function to learn an action translator among training tasks.
We theoretically verify that the value of the transferred policy with the action translator can be close to the value of the source policy.
We propose to combine the action translator with context-based meta-RL algorithms for better data collection and more efficient exploration during meta-training.
arXiv Detail & Related papers (2022-07-19T04:58:06Z) - On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning [71.55412580325743]
We show that multi-task pretraining with fine-tuning on new tasks performs equally as well, or better, than meta-pretraining with meta test-time adaptation.
This is encouraging for future research, as multi-task pretraining tends to be simpler and computationally cheaper than meta-RL.
arXiv Detail & Related papers (2022-06-07T13:24:00Z) - Skill-based Meta-Reinforcement Learning [65.31995608339962]
We devise a method that enables meta-learning on long-horizon, sparse-reward tasks.
Our core idea is to leverage prior experience extracted from offline datasets during meta-learning.
arXiv Detail & Related papers (2022-04-25T17:58:19Z) - Meta-Reinforcement Learning in Broad and Non-Parametric Environments [8.091658684517103]
We introduce TIGR, a Task-Inference-based meta-RL algorithm for tasks in non-parametric environments.
We decouple the policy training from the task-inference learning and efficiently train the inference mechanism on the basis of an unsupervised reconstruction objective.
We provide a benchmark with qualitatively distinct tasks based on the half-cheetah environment and demonstrate the superior performance of TIGR compared to state-of-the-art meta-RL approaches.
arXiv Detail & Related papers (2021-08-08T19:32:44Z) - Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.