You Only Live Once: Single-Life Reinforcement Learning
- URL: http://arxiv.org/abs/2210.08863v1
- Date: Mon, 17 Oct 2022 09:00:11 GMT
- Title: You Only Live Once: Single-Life Reinforcement Learning
- Authors: Annie S. Chen, Archit Sharma, Sergey Levine, Chelsea Finn
- Abstract summary: In many real-world situations, the goal might not be to learn a policy that can do the task repeatedly, but simply to perform a new task successfully once in a single trial.
We formalize this problem setting, where an agent must complete a task within a single episode without interventions.
We propose an algorithm, $Q$-weighted adversarial learning (QWALE), which employs a distribution matching strategy.
- Score: 124.1738675154651
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning algorithms are typically designed to learn a
performant policy that can repeatedly and autonomously complete a task, usually
starting from scratch. However, in many real-world situations, the goal might
not be to learn a policy that can do the task repeatedly, but simply to perform
a new task successfully once in a single trial. For example, imagine a disaster
relief robot tasked with retrieving an item from a fallen building, where it
cannot get direct supervision from humans. It must retrieve this object within
one test-time trial, and must do so while tackling unknown obstacles, though it
may leverage knowledge it has of the building before the disaster. We formalize
this problem setting, which we call single-life reinforcement learning (SLRL),
where an agent must complete a task within a single episode without
interventions, utilizing its prior experience while contending with some form
of novelty. SLRL provides a natural setting to study the challenge of
autonomously adapting to unfamiliar situations, and we find that algorithms
designed for standard episodic reinforcement learning often struggle to recover
from out-of-distribution states in this setting. Motivated by this observation,
we propose an algorithm, $Q$-weighted adversarial learning (QWALE), which
employs a distribution matching strategy that leverages the agent's prior
experience as guidance in novel situations. Our experiments on several
single-life continuous control problems indicate that methods based on our
distribution matching formulation are 20-60% more successful because they can
more quickly recover from novel states.
Related papers
- Generalizing to New Tasks via One-Shot Compositional Subgoals [23.15624959305799]
The ability to generalize to previously unseen tasks with little to no supervision is a key challenge in modern machine learning research.
We introduce CASE which attempts to address these issues by training an Imitation Learning agent using adaptive "near future" subgoals.
Our experiments show that the proposed approach consistently outperforms the previous state-of-the-art compositional Imitation Learning approach by 30%.
arXiv Detail & Related papers (2022-05-16T14:30:11Z) - A State-Distribution Matching Approach to Non-Episodic Reinforcement
Learning [61.406020873047794]
A major hurdle to real-world application arises from the development of algorithms in an episodic setting.
We propose a new method, MEDAL, that trains the backward policy to match the state distribution in the provided demonstrations.
Our experiments show that MEDAL matches or outperforms prior methods on three sparse-reward continuous control tasks.
arXiv Detail & Related papers (2022-05-11T00:06:29Z) - Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world.
Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts.
This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z) - Persistent Reinforcement Learning via Subgoal Curricula [114.83989499740193]
Value-accelerated Persistent Reinforcement Learning (VaPRL) generates a curriculum of initial states.
VaPRL reduces the interventions required by three orders of magnitude compared to episodic reinforcement learning.
arXiv Detail & Related papers (2021-07-27T16:39:45Z) - Continual Learning of Control Primitives: Skill Discovery via
Reset-Games [128.36174682118488]
We show how a single method can allow an agent to acquire skills with minimal supervision.
We do this by exploiting the insight that the need to "reset" an agent to a broad set of initial states for a learning task provides a natural setting to learn a diverse set of "reset-skills"
arXiv Detail & Related papers (2020-11-10T18:07:44Z) - Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent.
We present a new approach to self-supervised exploration and fast adaptation to new tasks.
Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.