METRA: Scalable Unsupervised RL with Metric-Aware Abstraction
- URL: http://arxiv.org/abs/2310.08887v2
- Date: Sun, 10 Mar 2024 04:30:17 GMT
- Title: METRA: Scalable Unsupervised RL with Metric-Aware Abstraction
- Authors: Seohong Park, Oleh Rybkin, Sergey Levine
- Abstract summary: Metric-Aware Abstraction (METRA) is a novel unsupervised reinforcement learning objective.
By learning to move in every direction in the latent space, METRA obtains a tractable set of diverse behaviors.
We show that METRA can discover a variety of useful behaviors even in complex, pixel-based environments.
- Score: 69.90741082762646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised pre-training strategies have proven to be highly effective in
natural language processing and computer vision. Likewise, unsupervised
reinforcement learning (RL) holds the promise of discovering a variety of
potentially useful behaviors that can accelerate the learning of a wide array
of downstream tasks. Previous unsupervised RL approaches have mainly focused on
pure exploration and mutual information skill learning. However, despite the
previous attempts, making unsupervised RL truly scalable still remains a major
open challenge: pure exploration approaches might struggle in complex
environments with large state spaces, where covering every possible transition
is infeasible, and mutual information skill learning approaches might
completely fail to explore the environment due to the lack of incentives. To
make unsupervised RL scalable to complex, high-dimensional environments, we
propose a novel unsupervised RL objective, which we call Metric-Aware
Abstraction (METRA). Our main idea is, instead of directly covering the entire
state space, to only cover a compact latent space $Z$ that is metrically
connected to the state space $S$ by temporal distances. By learning to move in
every direction in the latent space, METRA obtains a tractable set of diverse
behaviors that approximately cover the state space, being scalable to
high-dimensional environments. Through our experiments in five locomotion and
manipulation environments, we demonstrate that METRA can discover a variety of
useful behaviors even in complex, pixel-based environments, being the first
unsupervised RL method that discovers diverse locomotion behaviors in
pixel-based Quadruped and Humanoid. Our code and videos are available at
https://seohong.me/projects/metra/
Related papers
- Nonprehensile Planar Manipulation through Reinforcement Learning with
Multimodal Categorical Exploration [8.343657309038285]
Reinforcement Learning is a powerful framework for developing such robot controllers.
We propose a multimodal exploration approach through categorical distributions, which enables us to train planar pushing RL policies.
We show that the learned policies are robust to external disturbances and observation noise, and scale to tasks with multiple pushers.
arXiv Detail & Related papers (2023-08-04T16:55:00Z) - Discrete Control in Real-World Driving Environments using Deep
Reinforcement Learning [2.467408627377504]
We introduce a framework (perception, planning, and control) in a real-world driving environment that transfers the real-world environments into gaming environments.
We propose variations of existing Reinforcement Learning (RL) algorithms in a multi-agent setting to learn and execute the discrete control in real-world environments.
arXiv Detail & Related papers (2022-11-29T04:24:03Z) - Guaranteed Discovery of Controllable Latent States with Multi-Step
Inverse Models [51.754160866582005]
Agent-Controllable State Discovery algorithm (AC-State)
Algorithm consists of a multi-step inverse model (predicting actions from distant observations) with an information bottleneck.
We demonstrate the discovery of controllable latent state in three domains: localizing a robot arm with distractions, exploring in a maze alongside other agents, and navigating in the Matterport house simulator.
arXiv Detail & Related papers (2022-07-17T17:06:52Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - How to Train Your Robot with Deep Reinforcement Learning; Lessons We've
Learned [111.06812202454364]
We present a number of case studies involving robotic deep RL.
We discuss commonly perceived challenges in deep RL and how they have been addressed in these works.
We also provide an overview of other outstanding challenges, many of which are unique to the real-world robotics setting.
arXiv Detail & Related papers (2021-02-04T22:09:28Z) - Batch Exploration with Examples for Scalable Robotic Reinforcement
Learning [63.552788688544254]
Batch Exploration with Examples (BEE) explores relevant regions of the state-space guided by a modest number of human provided images of important states.
BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka robot.
arXiv Detail & Related papers (2020-10-22T17:49:25Z) - Learning to Move with Affordance Maps [57.198806691838364]
The ability to autonomously explore and navigate a physical space is a fundamental requirement for virtually any mobile autonomous agent.
Traditional SLAM-based approaches for exploration and navigation largely focus on leveraging scene geometry.
We show that learned affordance maps can be used to augment traditional approaches for both exploration and navigation, providing significant improvements in performance.
arXiv Detail & Related papers (2020-01-08T04:05:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.