World Model as a Graph: Learning Latent Landmarks for Planning
- URL: http://arxiv.org/abs/2011.12491v3
- Date: Wed, 30 Jun 2021 21:00:52 GMT
- Title: World Model as a Graph: Learning Latent Landmarks for Planning
- Authors: Lunjun Zhang, Ge Yang, Bradly C. Stadie
- Abstract summary: Planning is a hallmark of human intelligence.
One prominent framework, Model-Based RL, learns a world model and plans using step-by-step virtual rollouts.
We propose to learn graph-structured world models composed of sparse, multi-step transitions.
- Score: 12.239590266108115
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Planning - the ability to analyze the structure of a problem in the large and
decompose it into interrelated subproblems - is a hallmark of human
intelligence. While deep reinforcement learning (RL) has shown great promise
for solving relatively straightforward control tasks, it remains an open
problem how to best incorporate planning into existing deep RL paradigms to
handle increasingly complex environments. One prominent framework, Model-Based
RL, learns a world model and plans using step-by-step virtual rollouts. This
type of world model quickly diverges from reality when the planning horizon
increases, thus struggling at long-horizon planning. How can we learn world
models that endow agents with the ability to do temporally extended reasoning?
In this work, we propose to learn graph-structured world models composed of
sparse, multi-step transitions. We devise a novel algorithm to learn latent
landmarks that are scattered (in terms of reachability) across the goal space
as the nodes on the graph. In this same graph, the edges are the reachability
estimates distilled from Q-functions. On a variety of high-dimensional
continuous control tasks ranging from robotic manipulation to navigation, we
demonstrate that our method, named L3P, significantly outperforms prior work,
and is oftentimes the only method capable of leveraging both the robustness of
model-free RL and generalization of graph-search algorithms. We believe our
work is an important step towards scalable planning in reinforcement learning.
Related papers
- A New View on Planning in Online Reinforcement Learning [19.35031543927374]
This paper investigates a new approach to model-based reinforcement learning using background planning.
We show that our GSP algorithm can propagate value from an abstract space in a manner that helps a variety of base learners learn significantly faster in different domains.
arXiv Detail & Related papers (2024-06-03T17:45:19Z) - Can Graph Learning Improve Planning in LLM-based Agents? [61.47027387839096]
Task planning in language agents is emerging as an important research topic alongside the development of large language models (LLMs)
In this paper, we explore graph learning-based methods for task planning, a direction that is to the prevalent focus on prompt design.
Our interest in graph learning stems from a theoretical discovery: the biases of attention and auto-regressive loss impede LLMs' ability to effectively navigate decision-making on graphs.
arXiv Detail & Related papers (2024-05-29T14:26:24Z) - Forecaster: Towards Temporally Abstract Tree-Search Planning from Pixels [42.275164872809746]
We introduce Forecaster, a deep hierarchical reinforcement learning approach which plans over high-level goals.
Forecaster learns an abstract model of its environment by modelling the transitions dynamics at an abstract level.
It then uses this world model to choose optimal high-level goals through a tree-search planning procedure.
arXiv Detail & Related papers (2023-10-16T01:13:26Z) - Efficient Learning of High Level Plans from Play [57.29562823883257]
We present Efficient Learning of High-Level Plans from Play (ELF-P), a framework for robotic learning that bridges motion planning and deep RL.
We demonstrate that ELF-P has significantly better sample efficiency than relevant baselines over multiple realistic manipulation tasks.
arXiv Detail & Related papers (2023-03-16T20:09:47Z) - Graph Value Iteration [35.87805182676444]
deep Reinforcement Learning (RL) has been successful in various search domains, such as two-player games and scientific discovery.
One major difficulty is that without a human-crafted function, reward signals remain zero unless the learning framework discovers any solution plan.
We propose a domain-independent method that augments graph search with graph value iteration to solve hard planning instances.
arXiv Detail & Related papers (2022-09-20T10:45:03Z) - Hierarchies of Planning and Reinforcement Learning for Robot Navigation [22.08479169489373]
In many navigation tasks, high-level (HL) task representations, like a rough floor plan, are available.
Previous work has demonstrated efficient learning by hierarchal approaches consisting of path planning in the HL representation.
This work proposes a novel hierarchical framework that utilizes a trainable planning policy for the HL representation.
arXiv Detail & Related papers (2021-09-23T07:18:15Z) - Model-Based Reinforcement Learning via Latent-Space Collocation [110.04005442935828]
We argue that it is easier to solve long-horizon tasks by planning sequences of states rather than just actions.
We adapt the idea of collocation, which has shown good results on long-horizon tasks in optimal control literature, to the image-based setting by utilizing learned latent state space models.
arXiv Detail & Related papers (2021-06-24T17:59:18Z) - Offline Reinforcement Learning from Images with Latent Space Models [60.69745540036375]
offline reinforcement learning (RL) refers to the problem of learning policies from a static dataset of environment interactions.
We build on recent advances in model-based algorithms for offline RL, and extend them to high-dimensional visual observation spaces.
Our approach is both tractable in practice and corresponds to maximizing a lower bound of the ELBO in the unknown POMDP.
arXiv Detail & Related papers (2020-12-21T18:28:17Z) - Evolutionary Planning in Latent Space [7.863826008567604]
Planning is a powerful approach to reinforcement learning with several desirable properties.
We learn a world model that enables Evolutionary Planning in Latent Space.
We show how to build a model of the world by bootstrapping it with rollouts from a random policy and iteratively refining it with rollouts from an increasingly accurate planning policy.
arXiv Detail & Related papers (2020-11-23T09:21:30Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z) - Graph Ordering: Towards the Optimal by Learning [69.72656588714155]
Graph representation learning has achieved a remarkable success in many graph-based applications, such as node classification, prediction, and community detection.
However, for some kind of graph applications, such as graph compression and edge partition, it is very hard to reduce them to some graph representation learning tasks.
In this paper, we propose to attack the graph ordering problem behind such applications by a novel learning approach.
arXiv Detail & Related papers (2020-01-18T09:14:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.