Learning Abstract World Model for Value-preserving Planning with Options
- URL: http://arxiv.org/abs/2406.15850v1
- Date: Sat, 22 Jun 2024 13:41:02 GMT
- Title: Learning Abstract World Model for Value-preserving Planning with Options
- Authors: Rafael Rodriguez-Sanchez, George Konidaris,
- Abstract summary: We leverage the structure of a given set of temporally-extended actions to learn abstract Markov decision processes (MDPs)
We characterize state abstractions necessary to ensure that planning with these skills, by simulating trajectories in the abstract MDP, results in policies with bounded value loss in the original MDP.
We evaluate our approach in goal-based navigation environments that require continuous abstract states to plan successfully and show that abstract model learning improves the sample efficiency of planning and learning.
- Score: 11.254212901595523
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: General-purpose agents require fine-grained controls and rich sensory inputs to perform a wide range of tasks. However, this complexity often leads to intractable decision-making. Traditionally, agents are provided with task-specific action and observation spaces to mitigate this challenge, but this reduces autonomy. Instead, agents must be capable of building state-action spaces at the correct abstraction level from their sensorimotor experiences. We leverage the structure of a given set of temporally-extended actions to learn abstract Markov decision processes (MDPs) that operate at a higher level of temporal and state granularity. We characterize state abstractions necessary to ensure that planning with these skills, by simulating trajectories in the abstract MDP, results in policies with bounded value loss in the original MDP. We evaluate our approach in goal-based navigation environments that require continuous abstract states to plan successfully and show that abstract model learning improves the sample efficiency of planning and learning.
Related papers
- Action abstractions for amortized sampling [49.384037138511246]
We propose an approach to incorporate the discovery of action abstractions, or high-level actions, into the policy optimization process.
Our approach involves iteratively extracting action subsequences commonly used across many high-reward trajectories and chunking' them into a single action that is added to the action space.
arXiv Detail & Related papers (2024-10-19T19:22:50Z) - Spatio-temporal Value Semantics-based Abstraction for Dense Deep Reinforcement Learning [1.4542411354617986]
Intelligent Cyber-Physical Systems (ICPS) represent a specialized form of Cyber-Physical System (CPS)
CNNs and Deep Reinforcement Learning (DRL) undertake multifaceted tasks encompassing perception, decision-making, and control.
DRL confronts challenges in terms of efficiency, generalization capabilities, and data scarcity during decision-making process.
We propose an innovative abstract modeling approach grounded in spatial-temporal value semantics.
arXiv Detail & Related papers (2024-05-24T02:21:10Z) - Building Minimal and Reusable Causal State Abstractions for
Reinforcement Learning [63.58935783293342]
Causal Bisimulation Modeling (CBM) is a method that learns the causal relationships in the dynamics and reward functions for each task to derive a minimal, task-specific abstraction.
CBM's learned implicit dynamics models identify the underlying causal relationships and state abstractions more accurately than explicit ones.
arXiv Detail & Related papers (2024-01-23T05:43:15Z) - Hierarchical Imitation Learning with Vector Quantized Models [77.67190661002691]
We propose to use reinforcement learning to identify subgoals in expert trajectories.
We build a vector-quantized generative model for the identified subgoals to perform subgoal-level planning.
In experiments, the algorithm excels at solving complex, long-horizon decision-making problems outperforming state-of-the-art.
arXiv Detail & Related papers (2023-01-30T15:04:39Z) - Learning Efficient Abstract Planning Models that Choose What to Predict [28.013014215441505]
We show that existing symbolic operator learning approaches fall short in many robotics domains.
This is primarily because they attempt to learn operators that exactly predict all observed changes in the abstract state.
We propose to learn operators that 'choose what to predict' by only modelling changes necessary for abstract planning to achieve specified goals.
arXiv Detail & Related papers (2022-08-16T13:12:59Z) - Inventing Relational State and Action Abstractions for Effective and
Efficient Bilevel Planning [26.715198108255162]
We develop a novel framework for learning state and action abstractions.
We learn relational, neuro-symbolic abstractions that generalize over object identities and numbers.
We show that our learned abstractions are able to quickly solve held-out tasks of longer horizons.
arXiv Detail & Related papers (2022-03-17T22:13:09Z) - Value Function Spaces: Skill-Centric State Abstractions for Long-Horizon
Reasoning [120.38381203153159]
Reinforcement learning can train policies that effectively perform complex tasks.
For long-horizon tasks, the performance of these methods degrades with horizon, often necessitating reasoning over and composing lower-level skills.
We propose Value Function Spaces: a simple approach that produces such a representation by using the value functions corresponding to each lower-level skill.
arXiv Detail & Related papers (2021-11-04T22:46:16Z) - Dynamic probabilistic logic models for effective abstractions in RL [35.54018388244684]
RePReL is a hierarchical framework that leverages a relational planner to provide useful state abstractions for learning.
Our experiments show that RePReL not only achieves better performance and efficient learning on the task at hand but also demonstrates better generalization to unseen tasks.
arXiv Detail & Related papers (2021-10-15T18:53:04Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z) - Learning Abstract Models for Strategic Exploration and Fast Reward
Transfer [85.19766065886422]
We learn an accurate Markov Decision Process (MDP) over abstract states to avoid compounding errors.
Our approach achieves strong results on three of the hardest Arcade Learning Environment games.
We can reuse the learned abstract MDP for new reward functions, achieving higher reward in 1000x fewer samples than model-free methods trained from scratch.
arXiv Detail & Related papers (2020-07-12T03:33:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.