Off-Policy Meta-Reinforcement Learning Based on Feature Embedding Spaces
- URL: http://arxiv.org/abs/2101.01883v1
- Date: Wed, 6 Jan 2021 05:51:38 GMT
- Title: Off-Policy Meta-Reinforcement Learning Based on Feature Embedding Spaces
- Authors: Takahisa Imagawa, Takuya Hiraoka, Yoshimasa Tsuruoka
- Abstract summary: We propose a novel off-policy meta-RL method, embedding learning and evaluation of uncertainty (ELUE)
ELUE learns a belief model over the embedding space and a belief-conditional policy and Q-function.
We demonstrate that ELUE outperforms state-of-the-art meta RL methods through experiments on meta-RL benchmarks.
- Score: 14.029933823101084
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Meta-reinforcement learning (RL) addresses the problem of sample inefficiency
in deep RL by using experience obtained in past tasks for a new task to be
solved.
However, most meta-RL methods require partially or fully on-policy data,
i.e., they cannot reuse the data collected by past policies, which hinders the
improvement of sample efficiency.
To alleviate this problem, we propose a novel off-policy meta-RL method,
embedding learning and evaluation of uncertainty (ELUE).
An ELUE agent is characterized by the learning of a feature embedding space
shared among tasks.
It learns a belief model over the embedding space and a belief-conditional
policy and Q-function.
Then, for a new task, it collects data by the pretrained policy, and updates
its belief based on the belief model.
Thanks to the belief update, the performance can be improved with a small
amount of data.
In addition, it updates the parameters of the neural networks to adjust the
pretrained relationships when there are enough data.
We demonstrate that ELUE outperforms state-of-the-art meta RL methods through
experiments on meta-RL benchmarks.
Related papers
- How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - Data-Efficient Task Generalization via Probabilistic Model-based Meta
Reinforcement Learning [58.575939354953526]
PACOH-RL is a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics.
Existing Meta-RL methods require abundant meta-learning data, limiting their applicability in settings such as robotics.
Our experiment results demonstrate that PACOH-RL outperforms model-based RL and model-based Meta-RL baselines in adapting to new dynamic conditions.
arXiv Detail & Related papers (2023-11-13T18:51:57Z) - A Survey of Meta-Reinforcement Learning [69.76165430793571]
We cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL.
We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task.
We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
arXiv Detail & Related papers (2023-01-19T12:01:41Z) - Efficient Meta Reinforcement Learning for Preference-based Fast
Adaptation [17.165083095799712]
We study the problem of few-shot adaptation in the context of human-in-the-loop reinforcement learning.
We develop a meta-RL algorithm that enables fast policy adaptation with preference-based feedback.
arXiv Detail & Related papers (2022-11-20T03:55:09Z) - Boosting Offline Reinforcement Learning via Data Rebalancing [104.3767045977716]
offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets.
We propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged.
We dub our method ReD (Return-based Data Rebalance), which can be implemented with less than 10 lines of code change and adds negligible running time.
arXiv Detail & Related papers (2022-10-17T16:34:01Z) - Meta Reinforcement Learning with Successor Feature Based Context [51.35452583759734]
We propose a novel meta-RL approach that achieves competitive performance comparing to existing meta-RL algorithms.
Our method does not only learn high-quality policies for multiple tasks simultaneously but also can quickly adapt to new tasks with a small amount of training.
arXiv Detail & Related papers (2022-07-29T14:52:47Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Keep Doing What Worked: Behavioral Modelling Priors for Offline
Reinforcement Learning [25.099754758455415]
Off-policy reinforcement learning algorithms promise to be applicable in settings where only a fixed data-set of environment interactions is available.
Standard off-policy algorithms fail in the batch setting for continuous control.
arXiv Detail & Related papers (2020-02-19T19:21:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.