Geometric Re-Analysis of Classical MDP Solving Algorithms
- URL: http://arxiv.org/abs/2503.04203v1
- Date: Thu, 06 Mar 2025 08:29:36 GMT
- Title: Geometric Re-Analysis of Classical MDP Solving Algorithms
- Authors: Arsenii Mustafin, Aleksei Pakharev, Alex Olshevsky, Ioannis Ch. Paschalidis,
- Abstract summary: We build on a recently introduced geometric interpretation of Markov Decision Processes (MDPs) to analyze algorithms: Value Iteration (VI) and Policy Iteration (PI)<n>First, we develop a geometry-based analytical apparatus, including a transformation that modifies the discount factor $gamma, to improve convergence guarantees for these algorithms in several settings.
- Score: 15.627546283580166
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We build on a recently introduced geometric interpretation of Markov Decision Processes (MDPs) to analyze classical MDP-solving algorithms: Value Iteration (VI) and Policy Iteration (PI). First, we develop a geometry-based analytical apparatus, including a transformation that modifies the discount factor $\gamma$, to improve convergence guarantees for these algorithms in several settings. In particular, one of our results identifies a rotation component in the VI method, and as a consequence shows that when a Markov Reward Process (MRP) induced by the optimal policy is irreducible and aperiodic, the asymptotic convergence rate of value iteration is strictly smaller than $\gamma$.
Related papers
- Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis [30.713243690224207]
In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents' preferences for certain outcomes.
This paper proposes a new Q-learning algorithm for quantile optimization in MDPs with strong convergence and performance guarantees.
arXiv Detail & Related papers (2024-10-31T16:53:20Z) - Verification of Geometric Robustness of Neural Networks via Piecewise Linear Approximation and Lipschitz Optimisation [57.10353686244835]
We address the problem of verifying neural networks against geometric transformations of the input image, including rotation, scaling, shearing, and translation.
The proposed method computes provably sound piecewise linear constraints for the pixel values by using sampling and linear approximations in combination with branch-and-bound Lipschitz.
We show that our proposed implementation resolves up to 32% more verification cases than present approaches.
arXiv Detail & Related papers (2024-08-23T15:02:09Z) - MDP Geometry, Normalization and Reward Balancing Solvers [15.627546283580166]
We present a new geometric interpretation of Markov Decision Processes (MDPs) with a natural normalization procedure.<n>This advantage-preserving transformation of the MDP motivates a class of algorithms which we call Reward Balancing.<n>We provide a convergence analysis of several algorithms in this class, in particular showing that for MDPs for unknown transition probabilities we can improve upon state-of-the-art sample complexity results.
arXiv Detail & Related papers (2024-07-09T09:39:45Z) - Variable Substitution and Bilinear Programming for Aligning Partially Overlapping Point Sets [48.1015832267945]
This research presents a method to meet requirements through the minimization objective function of the RPM algorithm.
A branch-and-bound (BnB) algorithm is devised, which solely branches over the parameters, thereby boosting convergence rate.
Empirical evaluations demonstrate better robustness of the proposed methodology against non-rigid deformation, positional noise, and outliers, when compared with prevailing state-of-the-art transformations.
arXiv Detail & Related papers (2024-05-14T13:28:57Z) - From Optimization to Control: Quasi Policy Iteration [3.4376560669160394]
We introduce a novel control algorithm coined as quasi-policy iteration (QPI)
QPI is based on a novel approximation of the "Hessian" matrix in the policy iteration algorithm by exploiting two linear structural constraints specific to MDPs.
It exhibits an empirical convergence behavior similar to policy iteration with a very low sensitivity to the discount factor.
arXiv Detail & Related papers (2023-11-18T21:00:14Z) - Regularization and Variance-Weighted Regression Achieves Minimax
Optimality in Linear MDPs: Theory and Practice [79.48432795639403]
Mirror descent value iteration (MDVI) is an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL)
We study MDVI with linear function approximation through its sample complexity required to identify an $varepsilon$-optimal policy.
We present Variance-Weighted Least-Squares MDVI, the first theoretical algorithm that achieves nearly minimax optimal sample complexity for infinite-horizon linear MDPs.
arXiv Detail & Related papers (2023-05-22T16:13:05Z) - Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted
Markov Decision Processes [18.35462792871242]
Policy Mirror Descent is a family of algorithms that covers a wide range of novel and fundamental methods in reinforcement learning.
Motivated by the instability of policy iteration with inexact policy evaluation, PMD algorithmically regularises the policy improvement step of PI.
We show that the dimension-free $gamma$-rate of PI can be achieved by the general family of unregularised PMD algorithms under an adaptive step-size.
arXiv Detail & Related papers (2023-02-22T13:55:08Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - A unified algorithm framework for mean-variance optimization in
discounted Markov decision processes [7.510742715895749]
This paper studies the risk-averse mean-variance optimization in infinite-horizon discounted Markov decision processes (MDPs)
We introduce a pseudo mean to transform the untreatable MDP to a standard one with a redefined reward function in standard form.
We propose a unified algorithm framework with a bilevel optimization structure for the discounted mean-variance optimization.
arXiv Detail & Related papers (2022-01-15T02:19:56Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - A Dynamical Systems Approach for Convergence of the Bayesian EM
Algorithm [59.99439951055238]
We show how (discrete-time) Lyapunov stability theory can serve as a powerful tool to aid, or even lead, in the analysis (and potential design) of optimization algorithms that are not necessarily gradient-based.
The particular ML problem that this paper focuses on is that of parameter estimation in an incomplete-data Bayesian framework via the popular optimization algorithm known as maximum a posteriori expectation-maximization (MAP-EM)
We show that fast convergence (linear or quadratic) is achieved, which could have been difficult to unveil without our adopted S&C approach.
arXiv Detail & Related papers (2020-06-23T01:34:18Z) - Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis [102.29671176698373]
We address the problem of policy evaluation in discounted decision processes, and provide Markov-dependent guarantees on the $ell_infty$error under a generative model.
We establish both and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms.
arXiv Detail & Related papers (2020-03-16T17:15:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.