Exploiting Multiple Abstractions in Episodic RL via Reward Shaping
- URL: http://arxiv.org/abs/2303.00516v2
- Date: Fri, 4 Aug 2023 14:22:02 GMT
- Title: Exploiting Multiple Abstractions in Episodic RL via Reward Shaping
- Authors: Roberto Cipollone, Giuseppe De Giacomo, Marco Favorito, Luca Iocchi,
Fabio Patrizi
- Abstract summary: We consider a linear hierarchy of abstraction layers of the Markov Decision Process (MDP) underlying the target domain.
We propose a novel form of Reward Shaping where the solution obtained at the abstract level is used to offer rewards to the more concrete MDP.
- Score: 23.61187560936501
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One major limitation to the applicability of Reinforcement Learning (RL) to
many practical domains is the large number of samples required to learn an
optimal policy. To address this problem and improve learning efficiency, we
consider a linear hierarchy of abstraction layers of the Markov Decision
Process (MDP) underlying the target domain. Each layer is an MDP representing a
coarser model of the one immediately below in the hierarchy. In this work, we
propose a novel form of Reward Shaping where the solution obtained at the
abstract level is used to offer rewards to the more concrete MDP, in such a way
that the abstract solution guides the learning in the more complex domain. In
contrast with other works in Hierarchical RL, our technique has few
requirements in the design of the abstract models and it is also tolerant to
modeling errors, thus making the proposed approach practical. We formally
analyze the relationship between the abstract models and the exploration
heuristic induced in the lower-level domain. Moreover, we prove that the method
guarantees optimal convergence and we demonstrate its effectiveness
experimentally.
Related papers
- Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithms [34.593772931446125]
monograph focuses on the exploration of various model-based and model-free approaches for Constrained within the context of average reward Markov Decision Processes (MDPs)
The primal-dual policy gradient-based algorithm is explored as a solution for constrained MDPs.
arXiv Detail & Related papers (2024-06-17T12:46:02Z) - Exploring the limits of Hierarchical World Models in Reinforcement Learning [0.7499722271664147]
We describe a novel HMBRL framework and evaluate it thoroughly.
We construct hierarchical world models that simulate environment dynamics at various levels of temporal abstraction.
Unlike most goal-conditioned H(MB)RL approaches, it also leads to comparatively low dimensional abstract actions.
arXiv Detail & Related papers (2024-06-01T16:29:03Z) - Spatio-temporal Value Semantics-based Abstraction for Dense Deep Reinforcement Learning [1.4542411354617986]
Intelligent Cyber-Physical Systems (ICPS) represent a specialized form of Cyber-Physical System (CPS)
CNNs and Deep Reinforcement Learning (DRL) undertake multifaceted tasks encompassing perception, decision-making, and control.
DRL confronts challenges in terms of efficiency, generalization capabilities, and data scarcity during decision-making process.
We propose an innovative abstract modeling approach grounded in spatial-temporal value semantics.
arXiv Detail & Related papers (2024-05-24T02:21:10Z) - Building Minimal and Reusable Causal State Abstractions for
Reinforcement Learning [63.58935783293342]
Causal Bisimulation Modeling (CBM) is a method that learns the causal relationships in the dynamics and reward functions for each task to derive a minimal, task-specific abstraction.
CBM's learned implicit dynamics models identify the underlying causal relationships and state abstractions more accurately than explicit ones.
arXiv Detail & Related papers (2024-01-23T05:43:15Z) - Learning Dynamic Abstract Representations for Sample-Efficient
Reinforcement Learning [22.25237742815589]
In many real-world problems, the learning agent needs to learn a problem's abstractions and solution simultaneously.
This paper presents a novel top-down approach for constructing state abstractions while carrying out reinforcement learning.
arXiv Detail & Related papers (2022-10-04T23:05:43Z) - A General Framework for Sample-Efficient Function Approximation in
Reinforcement Learning [132.45959478064736]
We propose a general framework that unifies model-based and model-free reinforcement learning.
We propose a novel estimation function with decomposable structural properties for optimization-based exploration.
Under our framework, a new sample-efficient algorithm namely OPtimization-based ExploRation with Approximation (OPERA) is proposed.
arXiv Detail & Related papers (2022-09-30T17:59:16Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Causal Dynamics Learning for Task-Independent State Abstraction [61.707048209272884]
We introduce Causal Dynamics Learning for Task-Independent State Abstraction (CDL)
CDL learns a theoretically proved causal dynamics model that removes unnecessary dependencies between state variables and the action.
A state abstraction can then be derived from the learned dynamics.
arXiv Detail & Related papers (2022-06-27T17:02:53Z) - Model-Invariant State Abstractions for Model-Based Reinforcement
Learning [54.616645151708994]
We introduce a new type of state abstraction called textitmodel-invariance.
This allows for generalization to novel combinations of unseen values of state variables.
We prove that an optimal policy can be learned over this model-invariance state abstraction.
arXiv Detail & Related papers (2021-02-19T10:37:54Z) - Learning Abstract Models for Strategic Exploration and Fast Reward
Transfer [85.19766065886422]
We learn an accurate Markov Decision Process (MDP) over abstract states to avoid compounding errors.
Our approach achieves strong results on three of the hardest Arcade Learning Environment games.
We can reuse the learned abstract MDP for new reward functions, achieving higher reward in 1000x fewer samples than model-free methods trained from scratch.
arXiv Detail & Related papers (2020-07-12T03:33:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.