PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals
- URL: http://arxiv.org/abs/2006.00900v1
- Date: Mon, 1 Jun 2020 12:53:09 GMT
- Title: PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals
- Authors: Henry Charlesworth and Giovanni Montana
- Abstract summary: PlanGAN is a model-based algorithm for solving multi-goal tasks in environments with sparse rewards.
Our studies indicate that PlanGAN can achieve comparable performance whilst being around 4-8 times more sample efficient.
- Score: 14.315501760755609
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning with sparse rewards remains a significant challenge in reinforcement
learning (RL), especially when the aim is to train a policy capable of
achieving multiple different goals. To date, the most successful approaches for
dealing with multi-goal, sparse reward environments have been model-free RL
algorithms. In this work we propose PlanGAN, a model-based algorithm
specifically designed for solving multi-goal tasks in environments with sparse
rewards. Our method builds on the fact that any trajectory of experience
collected by an agent contains useful information about how to achieve the
goals observed during that trajectory. We use this to train an ensemble of
conditional generative models (GANs) to generate plausible trajectories that
lead the agent from its current state towards a specified goal. We then combine
these imagined trajectories into a novel planning algorithm in order to achieve
the desired goal as efficiently as possible. The performance of PlanGAN has
been tested on a number of robotic navigation/manipulation tasks in comparison
with a range of model-free reinforcement learning baselines, including
Hindsight Experience Replay. Our studies indicate that PlanGAN can achieve
comparable performance whilst being around 4-8 times more sample efficient.
Related papers
- PLANRL: A Motion Planning and Imitation Learning Framework to Bootstrap Reinforcement Learning [13.564676246832544]
We introduce PLANRL, a framework that chooses when the robot should use classical motion planning and when it should learn a policy.
PLANRL switches between two modes of operation: reaching a waypoint using classical techniques when away from the objects and fine-grained manipulation control when about to interact with objects.
We evaluate our approach across multiple challenging simulation environments and real-world tasks, demonstrating superior performance in terms of adaptability, efficiency, and generalization compared to existing methods.
arXiv Detail & Related papers (2024-08-07T19:30:08Z) - Enhancing Robotic Navigation: An Evaluation of Single and
Multi-Objective Reinforcement Learning Strategies [0.9208007322096532]
This study presents a comparative analysis between single-objective and multi-objective reinforcement learning methods for training a robot to navigate effectively to an end goal.
By modifying the reward function to return a vector of rewards, each pertaining to a distinct objective, the robot learns a policy that effectively balances the different goals.
arXiv Detail & Related papers (2023-12-13T08:00:26Z) - GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models [31.628341050846768]
Goal-conditioned Offline Planning (GOPlan) is a novel model-based framework that contains two key phases.
GOPlan pretrains a prior policy capable of capturing multi-modal action distribution within the multi-goal dataset.
The reanalysis method generates high-quality imaginary data by planning with learned models for both intra-trajectory and inter-trajectory goals.
arXiv Detail & Related papers (2023-10-30T21:19:52Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Imitating Graph-Based Planning with Goal-Conditioned Policies [72.61631088613048]
We present a self-imitation scheme which distills a subgoal-conditioned policy into the target-goal-conditioned policy.
We empirically show that our method can significantly boost the sample-efficiency of the existing goal-conditioned RL methods.
arXiv Detail & Related papers (2023-03-20T14:51:10Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Exploration via Planning for Information about the Optimal Trajectory [67.33886176127578]
We develop a method that allows us to plan for exploration while taking the task and the current knowledge into account.
We demonstrate that our method learns strong policies with 2x fewer samples than strong exploration baselines.
arXiv Detail & Related papers (2022-10-06T20:28:55Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - Model-Based Reinforcement Learning via Latent-Space Collocation [110.04005442935828]
We argue that it is easier to solve long-horizon tasks by planning sequences of states rather than just actions.
We adapt the idea of collocation, which has shown good results on long-horizon tasks in optimal control literature, to the image-based setting by utilizing learned latent state space models.
arXiv Detail & Related papers (2021-06-24T17:59:18Z) - Online Bayesian Goal Inference for Boundedly-Rational Planning Agents [46.60073262357339]
We present an architecture capable of inferring an agent's goals online from both optimal and non-optimal sequences of actions.
Our architecture models agents as boundedly-rational planners that interleave search with execution by replanning.
We develop Sequential Inverse Plan Search (SIPS), a sequential Monte Carlo algorithm that exploits the online replanning assumption of these models.
arXiv Detail & Related papers (2020-06-13T01:48:10Z) - Ready Policy One: World Building Through Active Learning [35.358315617358976]
We introduce Ready Policy One (RP1), a framework that views Model-Based Reinforcement Learning as an active learning problem.
RP1 achieves this by utilizing a hybrid objective function, which crucially adapts during optimization.
We rigorously evaluate our method on a variety of continuous control tasks, and demonstrate statistically significant gains over existing approaches.
arXiv Detail & Related papers (2020-02-07T09:57:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.