MDP Geometry, Normalization and Value Free Solvers
- URL: http://arxiv.org/abs/2407.06712v2
- Date: Mon, 30 Sep 2024 20:00:09 GMT
- Title: MDP Geometry, Normalization and Value Free Solvers
- Authors: Arsenii Mustafin, Aleksei Pakharev, Alex Olshevsky, Ioannis Ch. Paschalidis,
- Abstract summary: The Markov Decision Process (MDP) is a widely used mathematical model for sequential decision-making problems.
We show that MDPs can be divided into equivalence classes with indistinguishable key solving algorithms dynamics.
- Score: 15.627546283580166
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Markov Decision Process (MDP) is a widely used mathematical model for sequential decision-making problems. In this paper, we present a new geometric interpretation of MDPs. Based on this interpretation, we show that MDPs can be divided into equivalence classes with indistinguishable key solving algorithms dynamics. This related normalization procedure enables the development of a novel class of MDP-solving algorithms that find optimal policies without computing policy values. The new algorithms we propose for different settings achieve and, in some cases, improve upon state-of-the-art results.
Related papers
- Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis [30.713243690224207]
In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents' preferences for certain outcomes.
This paper proposes a new Q-learning algorithm for quantile optimization in MDPs with strong convergence and performance guarantees.
arXiv Detail & Related papers (2024-10-31T16:53:20Z) - Solving Multi-Model MDPs by Coordinate Ascent and Dynamic Programming [8.495921422521068]
Multi-model Markov decision process (MMDP) is a promising framework for computing policies.
MMDPs aim to find a policy that maximizes the expected return over a distribution of MDP models.
We propose CADP, which combines a coordinate ascent method and a dynamic programming algorithm for solving MMDPs.
arXiv Detail & Related papers (2024-07-08T18:47:59Z) - Global Algorithms for Mean-Variance Optimization in Markov Decision
Processes [8.601670707452083]
Dynamic optimization of mean and variance in Markov decision processes (MDPs) is a long-standing challenge caused by the failure of dynamic programming.
We propose a new approach to find the globally optimal policy for combined metrics of steady-state mean and variance in an infinite-horizon undiscounted MDP.
arXiv Detail & Related papers (2023-02-27T12:17:43Z) - Continuous MDP Homomorphisms and Homomorphic Policy Gradient [51.25171126424949]
We extend the definition of MDP homomorphisms to encompass continuous actions in continuous state spaces.
We propose an actor-critic algorithm that is able to learn the policy and the MDP homomorphism map simultaneously.
arXiv Detail & Related papers (2022-09-15T15:26:49Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Making Linear MDPs Practical via Contrastive Representation Learning [101.75885788118131]
It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations.
We consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning.
We demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.
arXiv Detail & Related papers (2022-07-14T18:18:02Z) - A Survey for Solving Mixed Integer Programming via Machine Learning [76.04988886859871]
This paper surveys the trend of machine learning to solve mixed integer (MIP) problems.
In this paper, we first introduce the formulation and preliminaries of MIP and several traditional algorithms to solve MIP.
Then, we advocate further promoting the different integration of machine learning and MIP algorithms.
arXiv Detail & Related papers (2022-03-06T05:03:37Z) - Regret Analysis in Deterministic Reinforcement Learning [78.31410227443102]
We study the problem of regret, which is central to the analysis and design of optimal learning algorithms.
We present logarithmic problem-specific regret lower bounds that explicitly depend on the system parameter.
arXiv Detail & Related papers (2021-06-27T23:41:57Z) - CP-MDP: A CANDECOMP-PARAFAC Decomposition Approach to Solve a Markov
Decision Process Multidimensional Problem [21.79259092920586]
We develop an MDP solver for a multidimensional problem using a tensor decomposition method.
We show that our approach can compute much larger problems using substantially less memory.
arXiv Detail & Related papers (2021-02-27T21:33:19Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.