Related papers: Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices

Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices

URL: http://arxiv.org/abs/2008.02790v4
Date: Fri, 12 Nov 2021 02:08:50 GMT
Title: Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices
Authors: Evan Zheran Liu, Aditi Raghunathan, Percy Liang, Chelsea Finn
Abstract summary: meta-reinforcement learning (meta-RL) builds agents that can quickly learn new tasks by leveraging prior experience on related tasks. In principle, optimal exploration and exploitation can be learned end-to-end by simply maximizing task performance. We present DREAM, which avoids local optima in end-to-end training, without sacrificing optimal exploration.
Score: 132.49849640628727
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The goal of meta-reinforcement learning (meta-RL) is to build agents that can quickly learn new tasks by leveraging prior experience on related tasks. Learning a new task often requires both exploring to gather task-relevant information and exploiting this information to solve the task. In principle, optimal exploration and exploitation can be learned end-to-end by simply maximizing task performance. However, such meta-RL approaches struggle with local optima due to a chicken-and-egg problem: learning to explore requires good exploitation to gauge the exploration's utility, but learning to exploit requires information gathered via exploration. Optimizing separate objectives for exploration and exploitation can avoid this problem, but prior meta-RL exploration objectives yield suboptimal policies that gather information irrelevant to the task. We alleviate both concerns by constructing an exploitation objective that automatically identifies task-relevant information and an exploration objective to recover only this information. This avoids local optima in end-to-end training, without sacrificing optimal exploration. Empirically, DREAM substantially outperforms existing approaches on complex meta-RL problems, such as sparse-reward 3D visual navigation. Videos of DREAM: https://ezliu.github.io/dream/

Related papers

DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning [33.66640909392995]
We argue that solving complex and high-dimensional tasks requires solving simpler tasks that are relevant to the target task.<n>We propose a method for directed sparse-reward goal-conditioned very long-horizon RL (DISCOVER), which selects exploratory goals in the direction of the target task.
arXiv Detail & Related papers (2025-05-26T11:35:07Z)
A Survey of Meta-Reinforcement Learning [69.76165430793571]
We cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL. We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task. We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
arXiv Detail & Related papers (2023-01-19T12:01:41Z)
Goal Exploration Augmentation via Pre-trained Skills for Sparse-Reward Long-Horizon Goal-Conditioned Reinforcement Learning [6.540225358657128]
Reinforcement learning (RL) often struggles to accomplish a sparse-reward long-horizon task in a complex environment. Goal-conditioned reinforcement learning (GCRL) has been employed to tackle this difficult problem via a curriculum of easy-to-reach sub-goals. In GCRL, exploring novel sub-goals is essential for the agent to ultimately find the pathway to the desired goal.
arXiv Detail & Related papers (2022-10-28T11:11:04Z)
Learning Action Translator for Meta Reinforcement Learning on Sparse-Reward Tasks [56.63855534940827]
This work introduces a novel objective function to learn an action translator among training tasks. We theoretically verify that the value of the transferred policy with the action translator can be close to the value of the source policy. We propose to combine the action translator with context-based meta-RL algorithms for better data collection and more efficient exploration during meta-training.
arXiv Detail & Related papers (2022-07-19T04:58:06Z)
Skill-based Meta-Reinforcement Learning [65.31995608339962]
We devise a method that enables meta-learning on long-horizon, sparse-reward tasks. Our core idea is to leverage prior experience extracted from offline datasets during meta-learning.
arXiv Detail & Related papers (2022-04-25T17:58:19Z)
Follow your Nose: Using General Value Functions for Directed Exploration in Reinforcement Learning [5.40729975786985]
This paper explores the idea of combining exploration with auxiliary task learning using General Value Functions (GVFs) and a directed exploration strategy. We provide a simple way to learn options (sequences of actions) instead of having to handcraft them, and demonstrate the performance advantage in three navigation tasks.
arXiv Detail & Related papers (2022-03-02T05:14:11Z)
Batch Exploration with Examples for Scalable Robotic Reinforcement Learning [63.552788688544254]
Batch Exploration with Examples (BEE) explores relevant regions of the state-space guided by a modest number of human provided images of important states. BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka robot.
arXiv Detail & Related papers (2020-10-22T17:49:25Z)
MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration [52.48362697163477]
Experimental evaluation shows that our meta-RL method significantly outperforms state-of-the-art baselines on sparse-reward tasks. We model an exploration policy learning problem for meta-RL, which is separated from exploitation policy learning. We develop a new off-policy meta-RL framework, which efficiently learns separate context-aware exploration and exploitation policies.
arXiv Detail & Related papers (2020-06-15T06:56:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.