Forecaster: Towards Temporally Abstract Tree-Search Planning from Pixels
- URL: http://arxiv.org/abs/2310.09997v1
- Date: Mon, 16 Oct 2023 01:13:26 GMT
- Title: Forecaster: Towards Temporally Abstract Tree-Search Planning from Pixels
- Authors: Thomas Jiralerspong, Flemming Kondrup, Doina Precup, Khimya Khetarpal
- Abstract summary: We introduce Forecaster, a deep hierarchical reinforcement learning approach which plans over high-level goals.
Forecaster learns an abstract model of its environment by modelling the transitions dynamics at an abstract level.
It then uses this world model to choose optimal high-level goals through a tree-search planning procedure.
- Score: 42.275164872809746
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability to plan at many different levels of abstraction enables agents to
envision the long-term repercussions of their decisions and thus enables
sample-efficient learning. This becomes particularly beneficial in complex
environments from high-dimensional state space such as pixels, where the goal
is distant and the reward sparse. We introduce Forecaster, a deep hierarchical
reinforcement learning approach which plans over high-level goals leveraging a
temporally abstract world model. Forecaster learns an abstract model of its
environment by modelling the transitions dynamics at an abstract level and
training a world model on such transition. It then uses this world model to
choose optimal high-level goals through a tree-search planning procedure. It
additionally trains a low-level policy that learns to reach those goals. Our
method not only captures building world models with longer horizons, but also,
planning with such models in downstream tasks. We empirically demonstrate
Forecaster's potential in both single-task learning and generalization to new
tasks in the AntMaze domain.
Related papers
- Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction [19.59151245929067]
We study whether giving an agent an object-centric mapping (describing a set of items and their attributes) allow for more efficient learning.
We find this problem is best solved hierarchically by modelling items at a higher level of state abstraction to pixels.
We make use of this to propose a fully model-based algorithm that learns a discriminative world model.
arXiv Detail & Related papers (2024-08-21T17:59:31Z) - Exploring the limits of Hierarchical World Models in Reinforcement Learning [0.7499722271664147]
We describe a novel HMBRL framework and evaluate it thoroughly.
We construct hierarchical world models that simulate environment dynamics at various levels of temporal abstraction.
Unlike most goal-conditioned H(MB)RL approaches, it also leads to comparatively low dimensional abstract actions.
arXiv Detail & Related papers (2024-06-01T16:29:03Z) - Efficient Learning of High Level Plans from Play [57.29562823883257]
We present Efficient Learning of High-Level Plans from Play (ELF-P), a framework for robotic learning that bridges motion planning and deep RL.
We demonstrate that ELF-P has significantly better sample efficiency than relevant baselines over multiple realistic manipulation tasks.
arXiv Detail & Related papers (2023-03-16T20:09:47Z) - Hierarchical Imitation Learning with Vector Quantized Models [77.67190661002691]
We propose to use reinforcement learning to identify subgoals in expert trajectories.
We build a vector-quantized generative model for the identified subgoals to perform subgoal-level planning.
In experiments, the algorithm excels at solving complex, long-horizon decision-making problems outperforming state-of-the-art.
arXiv Detail & Related papers (2023-01-30T15:04:39Z) - Learning Efficient Abstract Planning Models that Choose What to Predict [28.013014215441505]
We show that existing symbolic operator learning approaches fall short in many robotics domains.
This is primarily because they attempt to learn operators that exactly predict all observed changes in the abstract state.
We propose to learn operators that 'choose what to predict' by only modelling changes necessary for abstract planning to achieve specified goals.
arXiv Detail & Related papers (2022-08-16T13:12:59Z) - Deep Hierarchical Planning from Pixels [86.14687388689204]
Director is a method for learning hierarchical behaviors directly from pixels by planning inside the latent space of a learned world model.
Despite operating in latent space, the decisions are interpretable because the world model can decode goals into images for visualization.
Director also learns successful behaviors across a wide range of environments, including visual control, Atari games, and DMLab levels.
arXiv Detail & Related papers (2022-06-08T18:20:15Z) - Landmark Policy Optimization for Object Navigation Task [77.34726150561087]
This work studies object goal navigation task, which involves navigating to the closest object related to the given semantic category in unseen environments.
Recent works have shown significant achievements both in the end-to-end Reinforcement Learning approach and modular systems, but need a big step forward to be robust and optimal.
We propose a hierarchical method that incorporates standard task formulation and additional area knowledge as landmarks, with a way to extract these landmarks.
arXiv Detail & Related papers (2021-09-17T12:28:46Z) - World Model as a Graph: Learning Latent Landmarks for Planning [12.239590266108115]
Planning is a hallmark of human intelligence.
One prominent framework, Model-Based RL, learns a world model and plans using step-by-step virtual rollouts.
We propose to learn graph-structured world models composed of sparse, multi-step transitions.
arXiv Detail & Related papers (2020-11-25T02:49:21Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z) - From proprioception to long-horizon planning in novel environments: A
hierarchical RL model [4.44317046648898]
In this work, we introduce a simple, three-level hierarchical architecture that reflects different types of reasoning.
We apply our method to a series of navigation tasks in the Mujoco Ant environment.
arXiv Detail & Related papers (2020-06-11T17:19:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.