Decoupling Exploration and Exploitation for Meta-Reinforcement Learning
without Sacrifices
- URL: http://arxiv.org/abs/2008.02790v4
- Date: Fri, 12 Nov 2021 02:08:50 GMT
- Title: Decoupling Exploration and Exploitation for Meta-Reinforcement Learning
without Sacrifices
- Authors: Evan Zheran Liu, Aditi Raghunathan, Percy Liang, Chelsea Finn
- Abstract summary: meta-reinforcement learning (meta-RL) builds agents that can quickly learn new tasks by leveraging prior experience on related tasks.
In principle, optimal exploration and exploitation can be learned end-to-end by simply maximizing task performance.
We present DREAM, which avoids local optima in end-to-end training, without sacrificing optimal exploration.
- Score: 132.49849640628727
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of meta-reinforcement learning (meta-RL) is to build agents that can
quickly learn new tasks by leveraging prior experience on related tasks.
Learning a new task often requires both exploring to gather task-relevant
information and exploiting this information to solve the task. In principle,
optimal exploration and exploitation can be learned end-to-end by simply
maximizing task performance. However, such meta-RL approaches struggle with
local optima due to a chicken-and-egg problem: learning to explore requires
good exploitation to gauge the exploration's utility, but learning to exploit
requires information gathered via exploration. Optimizing separate objectives
for exploration and exploitation can avoid this problem, but prior meta-RL
exploration objectives yield suboptimal policies that gather information
irrelevant to the task. We alleviate both concerns by constructing an
exploitation objective that automatically identifies task-relevant information
and an exploration objective to recover only this information. This avoids
local optima in end-to-end training, without sacrificing optimal exploration.
Empirically, DREAM substantially outperforms existing approaches on complex
meta-RL problems, such as sparse-reward 3D visual navigation. Videos of DREAM:
https://ezliu.github.io/dream/
Related papers
- A Survey of Meta-Reinforcement Learning [69.76165430793571]
We cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL.
We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task.
We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
arXiv Detail & Related papers (2023-01-19T12:01:41Z) - Goal Exploration Augmentation via Pre-trained Skills for Sparse-Reward
Long-Horizon Goal-Conditioned Reinforcement Learning [6.540225358657128]
Reinforcement learning (RL) often struggles to accomplish a sparse-reward long-horizon task in a complex environment.
Goal-conditioned reinforcement learning (GCRL) has been employed to tackle this difficult problem via a curriculum of easy-to-reach sub-goals.
In GCRL, exploring novel sub-goals is essential for the agent to ultimately find the pathway to the desired goal.
arXiv Detail & Related papers (2022-10-28T11:11:04Z) - Learning Action Translator for Meta Reinforcement Learning on
Sparse-Reward Tasks [56.63855534940827]
This work introduces a novel objective function to learn an action translator among training tasks.
We theoretically verify that the value of the transferred policy with the action translator can be close to the value of the source policy.
We propose to combine the action translator with context-based meta-RL algorithms for better data collection and more efficient exploration during meta-training.
arXiv Detail & Related papers (2022-07-19T04:58:06Z) - Skill-based Meta-Reinforcement Learning [65.31995608339962]
We devise a method that enables meta-learning on long-horizon, sparse-reward tasks.
Our core idea is to leverage prior experience extracted from offline datasets during meta-learning.
arXiv Detail & Related papers (2022-04-25T17:58:19Z) - Follow your Nose: Using General Value Functions for Directed Exploration
in Reinforcement Learning [5.40729975786985]
This paper explores the idea of combining exploration with auxiliary task learning using General Value Functions (GVFs) and a directed exploration strategy.
We provide a simple way to learn options (sequences of actions) instead of having to handcraft them, and demonstrate the performance advantage in three navigation tasks.
arXiv Detail & Related papers (2022-03-02T05:14:11Z) - Batch Exploration with Examples for Scalable Robotic Reinforcement
Learning [63.552788688544254]
Batch Exploration with Examples (BEE) explores relevant regions of the state-space guided by a modest number of human provided images of important states.
BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka robot.
arXiv Detail & Related papers (2020-10-22T17:49:25Z) - MetaCURE: Meta Reinforcement Learning with Empowerment-Driven
Exploration [52.48362697163477]
Experimental evaluation shows that our meta-RL method significantly outperforms state-of-the-art baselines on sparse-reward tasks.
We model an exploration policy learning problem for meta-RL, which is separated from exploitation policy learning.
We develop a new off-policy meta-RL framework, which efficiently learns separate context-aware exploration and exploitation policies.
arXiv Detail & Related papers (2020-06-15T06:56:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.