On learning history based policies for controlling Markov decision
processes
- URL: http://arxiv.org/abs/2211.03011v1
- Date: Sun, 6 Nov 2022 02:47:55 GMT
- Title: On learning history based policies for controlling Markov decision
processes
- Authors: Gandharv Patil, Aditya Mahajan, Doina Precup
- Abstract summary: We introduce a theoretical framework for studying the behaviour of RL algorithms that learn to control an MDP.
We numerically evaluate its effectiveness on a set of continuous control tasks.
- Score: 44.17941122294582
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Reinforcementlearning(RL)folkloresuggeststhathistory-basedfunctionapproximationmethods,suchas
recurrent neural nets or history-based state abstraction, perform better than
their memory-less counterparts, due to the fact that function approximation in
Markov decision processes (MDP) can be viewed as inducing a Partially
observable MDP. However, there has been little formal analysis of such
history-based algorithms, as most existing frameworks focus exclusively on
memory-less features. In this paper, we introduce a theoretical framework for
studying the behaviour of RL algorithms that learn to control an MDP using
history-based feature abstraction mappings. Furthermore, we use this framework
to design a practical RL algorithm and we numerically evaluate its
effectiveness on a set of continuous control tasks.
Related papers
- Heuristic Algorithm-based Action Masking Reinforcement Learning (HAAM-RL) with Ensemble Inference Method [0.0]
This paper presents a novel reinforcement learning approach called HAAMRL (Heuristic ensemble-based Action Masking Reinforcement Learning)
The proposed approach exhibits superior performance and capability generalization, indicating superior effectiveness in optimizing complex manufacturing processes.
arXiv Detail & Related papers (2024-03-21T03:42:39Z) - On the Markov Property of Neural Algorithmic Reasoning: Analyses and
Methods [94.72563337153268]
We present ForgetNet, which does not use historical embeddings and thus is consistent with the Markov nature of the tasks.
We also introduce G-ForgetNet, which uses a gating mechanism to allow for the selective integration of historical embeddings.
Our experiments, based on the CLRS-30 algorithmic reasoning benchmark, demonstrate that both ForgetNet and G-ForgetNet achieve better generalization capability than existing methods.
arXiv Detail & Related papers (2024-03-07T22:35:22Z) - Bridging State and History Representations: Understanding Self-Predictive RL [24.772140132462468]
Representations are at the core of all deep reinforcement learning (RL) methods for Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs)
We show that many of these seemingly distinct methods and frameworks for state and history abstractions are, in fact, based on a common idea of self-predictive abstraction.
We provide theoretical insights into the widely adopted objectives and optimization, such as the stop-gradient technique, in learning self-predictive representations.
arXiv Detail & Related papers (2024-01-17T00:47:43Z) - Beyond Average Return in Markov Decision Processes [49.157108194438635]
We prove that only generalized means can be optimized exactly, even in the more general framework of Distributional Reinforcement Learning (DistRL).
We provide error bounds on the resulting estimators, and discuss the potential of this approach as well as its limitations.
arXiv Detail & Related papers (2023-10-31T08:36:41Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - Reinforcement Learning with History-Dependent Dynamic Contexts [29.8131459650617]
We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments.
We consider special cases of the model, with a focus on logistic DCMDPs, which break the exponential dependence on history length by leveraging aggregation functions to determine context transitions.
Motivated by our theoretical results, we introduce a practical model-based algorithm for logistic DCMDPs that plans in a latent space and uses optimism over history-dependent features.
arXiv Detail & Related papers (2023-02-04T01:58:21Z) - GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z) - Model Predictive Control via On-Policy Imitation Learning [28.96122879515294]
We develop new sample complexity results and performance guarantees for data-driven Model Predictive Control.
Our algorithm uses the structure of constrained linear MPC, and our analysis uses the properties of the explicit MPC solution to theoretically bound the number of online MPC trajectories needed to achieve optimal performance.
arXiv Detail & Related papers (2022-10-17T16:06:06Z) - Making Linear MDPs Practical via Contrastive Representation Learning [101.75885788118131]
It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations.
We consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning.
We demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.
arXiv Detail & Related papers (2022-07-14T18:18:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.