Learning Abstract Models for Strategic Exploration and Fast Reward
Transfer
- URL: http://arxiv.org/abs/2007.05896v1
- Date: Sun, 12 Jul 2020 03:33:50 GMT
- Title: Learning Abstract Models for Strategic Exploration and Fast Reward
Transfer
- Authors: Evan Zheran Liu, Ramtin Keramati, Sudarshan Seshadri, Kelvin Guu,
Panupong Pasupat, Emma Brunskill, Percy Liang
- Abstract summary: We learn an accurate Markov Decision Process (MDP) over abstract states to avoid compounding errors.
Our approach achieves strong results on three of the hardest Arcade Learning Environment games.
We can reuse the learned abstract MDP for new reward functions, achieving higher reward in 1000x fewer samples than model-free methods trained from scratch.
- Score: 85.19766065886422
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Model-based reinforcement learning (RL) is appealing because (i) it enables
planning and thus more strategic exploration, and (ii) by decoupling dynamics
from rewards, it enables fast transfer to new reward functions. However,
learning an accurate Markov Decision Process (MDP) over high-dimensional states
(e.g., raw pixels) is extremely challenging because it requires function
approximation, which leads to compounding errors. Instead, to avoid compounding
errors, we propose learning an abstract MDP over abstract states:
low-dimensional coarse representations of the state (e.g., capturing agent
position, ignoring other objects). We assume access to an abstraction function
that maps the concrete states to abstract states. In our approach, we construct
an abstract MDP, which grows through strategic exploration via planning.
Similar to hierarchical RL approaches, the abstract actions of the abstract MDP
are backed by learned subpolicies that navigate between abstract states. Our
approach achieves strong results on three of the hardest Arcade Learning
Environment games (Montezuma's Revenge, Pitfall!, and Private Eye), including
superhuman performance on Pitfall! without demonstrations. After training on
one task, we can reuse the learned abstract MDP for new reward functions,
achieving higher reward in 1000x fewer samples than model-free methods trained
from scratch.
Related papers
- Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction [19.59151245929067]
We study whether giving an agent an object-centric mapping (describing a set of items and their attributes) allow for more efficient learning.
We find this problem is best solved hierarchically by modelling items at a higher level of state abstraction to pixels.
We make use of this to propose a fully model-based algorithm that learns a discriminative world model.
arXiv Detail & Related papers (2024-08-21T17:59:31Z) - Learning Abstract World Model for Value-preserving Planning with Options [11.254212901595523]
We leverage the structure of a given set of temporally-extended actions to learn abstract Markov decision processes (MDPs)
We characterize state abstractions necessary to ensure that planning with these skills, by simulating trajectories in the abstract MDP, results in policies with bounded value loss in the original MDP.
We evaluate our approach in goal-based navigation environments that require continuous abstract states to plan successfully and show that abstract model learning improves the sample efficiency of planning and learning.
arXiv Detail & Related papers (2024-06-22T13:41:02Z) - Exploring the limits of Hierarchical World Models in Reinforcement Learning [0.7499722271664147]
We describe a novel HMBRL framework and evaluate it thoroughly.
We construct hierarchical world models that simulate environment dynamics at various levels of temporal abstraction.
Unlike most goal-conditioned H(MB)RL approaches, it also leads to comparatively low dimensional abstract actions.
arXiv Detail & Related papers (2024-06-01T16:29:03Z) - Building Minimal and Reusable Causal State Abstractions for
Reinforcement Learning [63.58935783293342]
Causal Bisimulation Modeling (CBM) is a method that learns the causal relationships in the dynamics and reward functions for each task to derive a minimal, task-specific abstraction.
CBM's learned implicit dynamics models identify the underlying causal relationships and state abstractions more accurately than explicit ones.
arXiv Detail & Related papers (2024-01-23T05:43:15Z) - AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph [62.685920585838616]
abstraction ability is essential in human intelligence, which remains under-explored in language models.
We present AbsPyramid, a unified entailment graph of 221K textual descriptions of abstraction knowledge.
arXiv Detail & Related papers (2023-11-15T18:11:23Z) - Exploiting Multiple Abstractions in Episodic RL via Reward Shaping [23.61187560936501]
We consider a linear hierarchy of abstraction layers of the Markov Decision Process (MDP) underlying the target domain.
We propose a novel form of Reward Shaping where the solution obtained at the abstract level is used to offer rewards to the more concrete MDP.
arXiv Detail & Related papers (2023-02-28T13:22:29Z) - Does Deep Learning Learn to Abstract? A Systematic Probing Framework [69.2366890742283]
Abstraction is a desirable capability for deep learning models, which means to induce abstract concepts from concrete instances and flexibly apply them beyond the learning context.
We introduce a systematic probing framework to explore the abstraction capability of deep learning models from a transferability perspective.
arXiv Detail & Related papers (2023-02-23T12:50:02Z) - Discrete State-Action Abstraction via the Successor Representation [3.453310639983932]
Abstraction is one approach that provides the agent with an intrinsic reward for transitioning in a latent space.
Our approach is the first for automatically learning a discrete abstraction of the underlying environment.
Our proposed algorithm, Discrete State-Action Abstraction (DSAA), iteratively swaps between training these options and using them to efficiently explore more of the environment.
arXiv Detail & Related papers (2022-06-07T17:37:30Z) - Exploratory State Representation Learning [63.942632088208505]
We propose a new approach called XSRL (eXploratory State Representation Learning) to solve the problems of exploration and SRL in parallel.
On one hand, it jointly learns compact state representations and a state transition estimator which is used to remove unexploitable information from the representations.
On the other hand, it continuously trains an inverse model, and adds to the prediction error of this model a $k$-step learning progress bonus to form the objective of a discovery policy.
arXiv Detail & Related papers (2021-09-28T10:11:07Z) - Model-free Representation Learning and Exploration in Low-rank MDPs [64.72023662543363]
We present the first model-free representation learning algorithms for low rank MDPs.
Key algorithmic contribution is a new minimax representation learning objective.
Result can accommodate general function approximation to scale to complex environments.
arXiv Detail & Related papers (2021-02-14T00:06:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.