When Learning Is Out of Reach, Reset: Generalization in Autonomous
Visuomotor Reinforcement Learning
- URL: http://arxiv.org/abs/2303.17600v1
- Date: Thu, 30 Mar 2023 17:59:26 GMT
- Title: When Learning Is Out of Reach, Reset: Generalization in Autonomous
Visuomotor Reinforcement Learning
- Authors: Zichen Zhang, Luca Weihs
- Abstract summary: Episodic training, where an agent's environment is reset after every success or failure, is the de facto standard when training embodied reinforcement learning (RL) agents.
In this work, we look to minimize, rather than completely eliminate, resets while building visual agents that can meaningfully generalize.
Our proposed approach significantly outperforms prior episodic, reset-free, and reset-minimizing approaches achieving higher success rates.
- Score: 10.469509984098705
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Episodic training, where an agent's environment is reset after every success
or failure, is the de facto standard when training embodied reinforcement
learning (RL) agents. The underlying assumption that the environment can be
easily reset is limiting both practically, as resets generally require human
effort in the real world and can be computationally expensive in simulation,
and philosophically, as we'd expect intelligent agents to be able to
continuously learn without intervention. Work in learning without any resets,
i.e{.} Reset-Free RL (RF-RL), is promising but is plagued by the problem of
irreversible transitions (e.g{.} an object breaking) which halt learning.
Moreover, the limited state diversity and instrument setup encountered during
RF-RL means that works studying RF-RL largely do not require their models to
generalize to new environments. In this work, we instead look to minimize,
rather than completely eliminate, resets while building visual agents that can
meaningfully generalize. As studying generalization has previously not been a
focus of benchmarks designed for RF-RL, we propose a new Stretch Pick-and-Place
benchmark designed for evaluating generalizations across goals, cosmetic
variations, and structural changes. Moreover, towards building performant
reset-minimizing RL agents, we propose unsupervised metrics to detect
irreversible transitions and a single-policy training mechanism to enable
generalization. Our proposed approach significantly outperforms prior episodic,
reset-free, and reset-minimizing approaches achieving higher success rates with
fewer resets in Stretch-P\&P and another popular RF-RL benchmark. Finally, we
find that our proposed approach can dramatically reduce the number of resets
required for training other embodied tasks, in particular for RoboTHOR
ObjectNav we obtain higher success rates than episodic approaches using 99.97\%
fewer resets.
Related papers
- World Models Increase Autonomy in Reinforcement Learning [6.151562278670799]
Reinforcement learning (RL) is an appealing paradigm for training intelligent agents.
MoReFree agent adapts two key mechanisms, exploration and policy learning, to handle reset-free tasks.
It exhibits superior data-efficiency across various reset-free tasks without access to environmental reward or demonstrations.
arXiv Detail & Related papers (2024-08-19T08:56:00Z) - Intelligent Switching for Reset-Free RL [19.154045065314243]
In the real world, the strong episode resetting mechanisms that are needed to train agents in simulation are unavailable.
Recent work aims to train agents with learned resets by constructing a second (textitbackward) agent that returns the forward agent to the initial state.
We create a new algorithm, Reset Free RL with Intelligently Switching Controller (RISC) which intelligently switches between the two agents based on the agent's confidence in achieving its current goal.
arXiv Detail & Related papers (2024-05-02T19:15:00Z) - REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models.
In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL.
We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z) - Direct Preference Optimization: Your Language Model is Secretly a Reward Model [119.65409513119963]
We introduce a new parameterization of the reward model in RLHF that enables extraction of the corresponding optimal policy in closed form.
The resulting algorithm, which we call Direct Preference Optimization (DPO), is stable, performant, and computationally lightweight.
Our experiments show that DPO can fine-tune LMs to align with human preferences as well as or better than existing methods.
arXiv Detail & Related papers (2023-05-29T17:57:46Z) - Provable Reset-free Reinforcement Learning by No-Regret Reduction [13.800970428473134]
We propose a generic no-regret reduction to systematically design reset-free RL algorithms.
Our reduction turns the reset-free RL problem into a two-player game.
We show that achieving sublinear regret in this two-player game would imply learning a policy that has both sublinear performance regret and sublinear total number of resets in the original RL problem.
arXiv Detail & Related papers (2023-01-06T05:51:53Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Beyond Tabula Rasa: Reincarnating Reinforcement Learning [37.201451908129386]
Learning tabula rasa, that is without any prior knowledge, is the prevalent workflow in reinforcement learning (RL) research.
We present reincarnating RL as an alternative workflow, where prior computational work is reused or transferred between design iterations of an RL agent.
We find that existing approaches fail in this setting and propose a simple algorithm to address their limitations.
arXiv Detail & Related papers (2022-06-03T15:11:10Z) - Persistent Reinforcement Learning via Subgoal Curricula [114.83989499740193]
Value-accelerated Persistent Reinforcement Learning (VaPRL) generates a curriculum of initial states.
VaPRL reduces the interventions required by three orders of magnitude compared to episodic reinforcement learning.
arXiv Detail & Related papers (2021-07-27T16:39:45Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z) - Transient Non-Stationarity and Generalisation in Deep Reinforcement
Learning [67.34810824996887]
Non-stationarity can arise in Reinforcement Learning (RL) even in stationary environments.
We propose Iterated Relearning (ITER) to improve generalisation of deep RL agents.
arXiv Detail & Related papers (2020-06-10T13:26:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.