MDP Geometry, Normalization and Value Free Solvers
- URL: http://arxiv.org/abs/2407.06712v1
- Date: Tue, 9 Jul 2024 09:39:45 GMT
- Title: MDP Geometry, Normalization and Value Free Solvers
- Authors: Arsenii Mustafin, Aleksei Pakharev, Alex Olshevsky, Ioannis Ch. Paschalidis,
- Abstract summary: We present a new geometric interpretation of MDP, which is useful for analyzing the dynamics of main MDP algorithms.
We show that MDPs can be split into equivalence classes with indistinguishable algorithm dynamics.
- Score: 15.627546283580166
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Markov Decision Process (MDP) is a common mathematical model for sequential decision-making problems. In this paper, we present a new geometric interpretation of MDP, which is useful for analyzing the dynamics of main MDP algorithms. Based on this interpretation, we demonstrate that MDPs can be split into equivalence classes with indistinguishable algorithm dynamics. The related normalization procedure allows for the design of a new class of MDP-solving algorithms that find optimal policies without computing policy values.
Related papers
- Solving Multi-Model MDPs by Coordinate Ascent and Dynamic Programming [8.495921422521068]
Multi-model Markov decision process (MMDP) is a promising framework for computing policies.
MMDPs aim to find a policy that maximizes the expected return over a distribution of MDP models.
We propose CADP, which combines a coordinate ascent method and a dynamic programming algorithm for solving MMDPs.
arXiv Detail & Related papers (2024-07-08T18:47:59Z) - Domain-Independent Dynamic Programming [5.449167190254984]
Domain-independent dynamic programming (DIDP) is a new model-based paradigm based on dynamic programming (DP)
We introduce Dynamic Programming Description Language (DyPDL), a formalism to define DP models based on a state transition system, inspired by AI planning.
We show that search algorithms can be used to solve DyPDL models and propose seven DIDP solvers.
arXiv Detail & Related papers (2024-01-25T01:48:09Z) - Optimality Guarantees for Particle Belief Approximation of POMDPs [55.83001584645448]
Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems.
POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid.
We propose a theory characterizing the approximation error of the particle filtering techniques that these algorithms use.
arXiv Detail & Related papers (2022-10-10T21:11:55Z) - Continuous MDP Homomorphisms and Homomorphic Policy Gradient [51.25171126424949]
We extend the definition of MDP homomorphisms to encompass continuous actions in continuous state spaces.
We propose an actor-critic algorithm that is able to learn the policy and the MDP homomorphism map simultaneously.
arXiv Detail & Related papers (2022-09-15T15:26:49Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Making Linear MDPs Practical via Contrastive Representation Learning [101.75885788118131]
It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations.
We consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning.
We demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.
arXiv Detail & Related papers (2022-07-14T18:18:02Z) - Semi-Markov Offline Reinforcement Learning for Healthcare [57.15307499843254]
We introduce three offline RL algorithms, namely, SDQN, SDDQN, and SBCQ.
We experimentally demonstrate that only these algorithms learn the optimal policy in variable-time environments.
We apply our new algorithms to a real-world offline dataset pertaining to warfarin dosing for stroke prevention.
arXiv Detail & Related papers (2022-03-17T14:51:21Z) - A Survey for Solving Mixed Integer Programming via Machine Learning [76.04988886859871]
This paper surveys the trend of machine learning to solve mixed integer (MIP) problems.
In this paper, we first introduce the formulation and preliminaries of MIP and several traditional algorithms to solve MIP.
Then, we advocate further promoting the different integration of machine learning and MIP algorithms.
arXiv Detail & Related papers (2022-03-06T05:03:37Z) - CP-MDP: A CANDECOMP-PARAFAC Decomposition Approach to Solve a Markov
Decision Process Multidimensional Problem [21.79259092920586]
We develop an MDP solver for a multidimensional problem using a tensor decomposition method.
We show that our approach can compute much larger problems using substantially less memory.
arXiv Detail & Related papers (2021-02-27T21:33:19Z) - A Relation Analysis of Markov Decision Process Frameworks [26.308541799686505]
We study the relation between different Decision Process (MDP) frameworks in the machine learning and econometrics literature.
We show that the entropy-regularized MDP is equivalent to a MDP model, and is strictly subsumed by the general regularized MDP.
arXiv Detail & Related papers (2020-08-18T09:27:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.