An Analysis of Model-Based Reinforcement Learning From Abstracted
Observations
- URL: http://arxiv.org/abs/2208.14407v3
- Date: Wed, 15 Nov 2023 12:39:37 GMT
- Title: An Analysis of Model-Based Reinforcement Learning From Abstracted
Observations
- Authors: Rolf A. N. Starre, Marco Loog, Elena Congeduti, Frans A. Oliehoek
- Abstract summary: We show that abstraction can introduce a dependence between samples collected online (e.g., in the real world) and results for Model-based Reinforcement learning (MBRL)
We show that we can use concentration inequalities for martingales to overcome this problem.
We illustrate this by combining R-MAX, a prototypical MBRL algorithm, with abstraction, thus producing the first performance guarantees for model-based 'RL from Abstracted Observations'
- Score: 24.964038353043918
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many methods for Model-based Reinforcement learning (MBRL) in Markov decision
processes (MDPs) provide guarantees for both the accuracy of the model they can
deliver and the learning efficiency. At the same time, state abstraction
techniques allow for a reduction of the size of an MDP while maintaining a
bounded loss with respect to the original problem. Therefore, it may come as a
surprise that no such guarantees are available when combining both techniques,
i.e., where MBRL merely observes abstract states. Our theoretical analysis
shows that abstraction can introduce a dependence between samples collected
online (e.g., in the real world). That means that, without taking this
dependence into account, results for MBRL do not directly extend to this
setting. Our result shows that we can use concentration inequalities for
martingales to overcome this problem. This result makes it possible to extend
the guarantees of existing MBRL algorithms to the setting with abstraction. We
illustrate this by combining R-MAX, a prototypical MBRL algorithm, with
abstraction, thus producing the first performance guarantees for model-based
'RL from Abstracted Observations': model-based reinforcement learning with an
abstract model.
Related papers
- Offline Model-Based Reinforcement Learning with Anti-Exploration [0.0]
We present Morse Model-based offline RL (MoMo), which extends the anti-exploration paradigm found in offline model-free RL.
MoMo performs offline reinforcement learning using an anti-exploration bonus to counteract value overestimation.
The latter outperforms prior model-based and model-free baselines on the majority of D4RL datasets tested.
arXiv Detail & Related papers (2024-08-20T10:29:21Z) - Exploiting Multiple Abstractions in Episodic RL via Reward Shaping [23.61187560936501]
We consider a linear hierarchy of abstraction layers of the Markov Decision Process (MDP) underlying the target domain.
We propose a novel form of Reward Shaping where the solution obtained at the abstract level is used to offer rewards to the more concrete MDP.
arXiv Detail & Related papers (2023-02-28T13:22:29Z) - GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Simplifying Model-based RL: Learning Representations, Latent-space
Models, and Policies with One Objective [142.36200080384145]
We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent.
We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
arXiv Detail & Related papers (2022-09-18T03:51:58Z) - Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning [92.18524491615548]
Contrastive self-supervised learning has been successfully integrated into the practice of (deep) reinforcement learning (RL)
We study how RL can be empowered by contrastive learning in a class of Markov decision processes (MDPs) and Markov games (MGs) with low-rank transitions.
Under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs.
arXiv Detail & Related papers (2022-07-29T17:29:08Z) - Causal Dynamics Learning for Task-Independent State Abstraction [61.707048209272884]
We introduce Causal Dynamics Learning for Task-Independent State Abstraction (CDL)
CDL learns a theoretically proved causal dynamics model that removes unnecessary dependencies between state variables and the action.
A state abstraction can then be derived from the learned dynamics.
arXiv Detail & Related papers (2022-06-27T17:02:53Z) - PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided
Exploration [15.173628100049129]
This work studies a model-based algorithm for both Kernelized Regulators (KNR) and linear Markov Decision Processes (MDPs)
For both models, our algorithm guarantees sample complexity and only uses access to a planning oracle.
Our method can also perform reward-free exploration efficiently.
arXiv Detail & Related papers (2021-07-15T15:49:30Z) - Model-Invariant State Abstractions for Model-Based Reinforcement
Learning [54.616645151708994]
We introduce a new type of state abstraction called textitmodel-invariance.
This allows for generalization to novel combinations of unseen values of state variables.
We prove that an optimal policy can be learned over this model-invariance state abstraction.
arXiv Detail & Related papers (2021-02-19T10:37:54Z) - Stealing Deep Reinforcement Learning Models for Fun and Profit [33.64948529132546]
This paper presents the first model extraction attack against Deep Reinforcement Learning (DRL)
It enables an external adversary to precisely recover a black-box DRL model only from its interaction with the environment.
arXiv Detail & Related papers (2020-06-09T03:24:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.