Planning Goals for Exploration
- URL: http://arxiv.org/abs/2303.13002v1
- Date: Thu, 23 Mar 2023 02:51:50 GMT
- Title: Planning Goals for Exploration
- Authors: Edward S. Hu, Richard Chang, Oleh Rybkin, Dinesh Jayaraman
- Abstract summary: "Planning Exploratory Goals" (PEG) is a method that sets goals for each training episode to directly optimize an intrinsic exploration reward.
PEG learns world models and adapts sampling-based planning algorithms to "plan goal commands"
- Score: 22.047797646698527
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dropped into an unknown environment, what should an agent do to quickly learn
about the environment and how to accomplish diverse tasks within it? We address
this question within the goal-conditioned reinforcement learning paradigm, by
identifying how the agent should set its goals at training time to maximize
exploration. We propose "Planning Exploratory Goals" (PEG), a method that sets
goals for each training episode to directly optimize an intrinsic exploration
reward. PEG first chooses goal commands such that the agent's goal-conditioned
policy, at its current level of training, will end up in states with high
exploration potential. It then launches an exploration policy starting at those
promising states. To enable this direct optimization, PEG learns world models
and adapts sampling-based planning algorithms to "plan goal commands". In
challenging simulated robotics environments including a multi-legged ant robot
in a maze, and a robot arm on a cluttered tabletop, PEG exploration enables
more efficient and effective training of goal-conditioned policies relative to
baselines and ablations. Our ant successfully navigates a long maze, and the
robot arm successfully builds a stack of three blocks upon command. Website:
https://penn-pal-lab.github.io/peg/
Related papers
- Exploring the Edges of Latent State Clusters for Goal-Conditioned Reinforcement Learning [6.266160051617362]
"Cluster Edge Exploration" ($CE2$) is a new goal-directed exploration algorithm that gives priority to goal states that remain accessible to the agent.
In challenging robotics environments, $CE2$ demonstrates superior efficiency in exploration compared to baseline methods and ablations.
arXiv Detail & Related papers (2024-11-03T01:21:43Z) - Multi-Robot Informative Path Planning for Efficient Target Mapping using Deep Reinforcement Learning [11.134855513221359]
We propose a novel deep reinforcement learning approach for multi-robot informative path planning.
We train our reinforcement learning policy via the centralized training and decentralized execution paradigm.
Our approach outperforms other state-of-the-art multi-robot target mapping approaches by 33.75% in terms of the number of discovered targets-of-interest.
arXiv Detail & Related papers (2024-09-25T14:27:37Z) - A Backbone for Long-Horizon Robot Task Understanding [8.889888977376886]
We propose a novel Therblig-based Backbone Framework (TBBF) to enhance robot task understanding and transferability.
This framework uses therbligs as the backbone to decompose high-level robot tasks into elemental robot configurations.
Experimental results validate these methods, achieving 94.37% recall in therblig segmentation and success rates of 94.4% and 80% in real-world online robot testing.
arXiv Detail & Related papers (2024-08-02T15:32:42Z) - Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks [50.27313829438866]
Plan-Seq-Learn (PSL) is a modular approach that uses motion planning to bridge the gap between abstract language and learned low-level control.
PSL achieves success rates of over 85%, out-performing language-based, classical, and end-to-end approaches.
arXiv Detail & Related papers (2024-05-02T17:59:31Z) - NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration [57.15811390835294]
This paper describes how we can train a single unified diffusion policy to handle both goal-directed navigation and goal-agnostic exploration.
We show that this unified policy results in better overall performance when navigating to visually indicated goals in novel environments.
Our experiments, conducted on a real-world mobile robot platform, show effective navigation in unseen environments in comparison with five alternative methods.
arXiv Detail & Related papers (2023-10-11T21:07:14Z) - Advanced Skills by Learning Locomotion and Local Navigation End-to-End [10.872193480485596]
In this work, we propose to solve the complete problem by training an end-to-end policy with deep reinforcement learning.
We demonstrate the successful deployment of policies on a real quadrupedal robot.
arXiv Detail & Related papers (2022-09-26T16:35:00Z) - Deep Hierarchical Planning from Pixels [86.14687388689204]
Director is a method for learning hierarchical behaviors directly from pixels by planning inside the latent space of a learned world model.
Despite operating in latent space, the decisions are interpretable because the world model can decode goals into images for visualization.
Director also learns successful behaviors across a wide range of environments, including visual control, Atari games, and DMLab levels.
arXiv Detail & Related papers (2022-06-08T18:20:15Z) - Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in
Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments.
To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command.
We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z) - Rapid Exploration for Open-World Navigation with Latent Goal Models [78.45339342966196]
We describe a robotic learning system for autonomous exploration and navigation in diverse, open-world environments.
At the core of our method is a learned latent variable model of distances and actions, along with a non-parametric topological memory of images.
We use an information bottleneck to regularize the learned policy, giving us (i) a compact visual representation of goals, (ii) improved generalization capabilities, and (iii) a mechanism for sampling feasible goals for exploration.
arXiv Detail & Related papers (2021-04-12T23:14:41Z) - Follow the Object: Curriculum Learning for Manipulation Tasks with
Imagined Goals [8.98526174345299]
This paper introduces a notion of imaginary object goals.
For a given manipulation task, the object of interest is first trained to reach a desired target position on its own.
The object policy is then leveraged to build a predictive model of plausible object trajectories.
The proposed algorithm, Follow the Object, has been evaluated on 7 MuJoCo environments.
arXiv Detail & Related papers (2020-08-05T12:19:14Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.