Related papers: Geometric Re-Analysis of Classical MDP Solving Algorithms

Related papers

Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis [30.713243690224207]
In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents' preferences for certain outcomes. This paper proposes a new Q-learning algorithm for quantile optimization in MDPs with strong convergence and performance guarantees.
arXiv Detail & Related papers (2024-10-31T16:53:20Z)
A Sample Efficient Alternating Minimization-based Algorithm For Robust Phase Retrieval [56.67706781191521]
In this work, we present a robust phase retrieval problem where the task is to recover an unknown signal. Our proposed oracle avoids the need for computationally spectral descent, using a simple gradient step and outliers.
arXiv Detail & Related papers (2024-09-07T06:37:23Z)
Verification of Geometric Robustness of Neural Networks via Piecewise Linear Approximation and Lipschitz Optimisation [57.10353686244835]
We address the problem of verifying neural networks against geometric transformations of the input image, including rotation, scaling, shearing, and translation. The proposed method computes provably sound piecewise linear constraints for the pixel values by using sampling and linear approximations in combination with branch-and-bound Lipschitz. We show that our proposed implementation resolves up to 32% more verification cases than present approaches.
arXiv Detail & Related papers (2024-08-23T15:02:09Z)
MDP Geometry, Normalization and Reward Balancing Solvers [15.627546283580166]
We present a new geometric interpretation of Markov Decision Processes (MDPs) with a natural normalization procedure.<n>This advantage-preserving transformation of the MDP motivates a class of algorithms which we call Reward Balancing.<n>We provide a convergence analysis of several algorithms in this class, in particular showing that for MDPs for unknown transition probabilities we can improve upon state-of-the-art sample complexity results.
arXiv Detail & Related papers (2024-07-09T09:39:45Z)
Variable Substitution and Bilinear Programming for Aligning Partially Overlapping Point Sets [48.1015832267945]
This research presents a method to meet requirements through the minimization objective function of the RPM algorithm. A branch-and-bound (BnB) algorithm is devised, which solely branches over the parameters, thereby boosting convergence rate. Empirical evaluations demonstrate better robustness of the proposed methodology against non-rigid deformation, positional noise, and outliers, when compared with prevailing state-of-the-art transformations.
arXiv Detail & Related papers (2024-05-14T13:28:57Z)
From Optimization to Control: Quasi Policy Iteration [3.4376560669160394]
We introduce a novel control algorithm coined as quasi-policy iteration (QPI) QPI is based on a novel approximation of the "Hessian" matrix in the policy iteration algorithm by exploiting two linear structural constraints specific to MDPs. It exhibits an empirical convergence behavior similar to policy iteration with a very low sensitivity to the discount factor.
arXiv Detail & Related papers (2023-11-18T21:00:14Z)
Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice [79.48432795639403]
Mirror descent value iteration (MDVI) is an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL) We study MDVI with linear function approximation through its sample complexity required to identify an $varepsilon$-optimal policy. We present Variance-Weighted Least-Squares MDVI, the first theoretical algorithm that achieves nearly minimax optimal sample complexity for infinite-horizon linear MDPs.
arXiv Detail & Related papers (2023-05-22T16:13:05Z)
Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes [18.35462792871242]
Policy Mirror Descent is a family of algorithms that covers a wide range of novel and fundamental methods in reinforcement learning. Motivated by the instability of policy iteration with inexact policy evaluation, PMD algorithmically regularises the policy improvement step of PI. We show that the dimension-free $gamma$-rate of PI can be achieved by the general family of unregularised PMD algorithms under an adaptive step-size.
arXiv Detail & Related papers (2023-02-22T13:55:08Z)
Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm. We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z)
A unified algorithm framework for mean-variance optimization in discounted Markov decision processes [7.510742715895749]
This paper studies the risk-averse mean-variance optimization in infinite-horizon discounted Markov decision processes (MDPs) We introduce a pseudo mean to transform the untreatable MDP to a standard one with a redefined reward function in standard form. We propose a unified algorithm framework with a bilevel optimization structure for the discounted mean-variance optimization.
arXiv Detail & Related papers (2022-01-15T02:19:56Z)
Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs. The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z)
Alternating Direction Method of Multipliers for Quantization [15.62692130672419]
We study the performance of the Alternating Direction Method of Multipliers for Quantization ($texttADMM-Q$) algorithm. We develop a few variants of $texttADMM-Q$ that can handle inexact update rules. We empirically evaluate the efficacy of our proposed approaches.
arXiv Detail & Related papers (2020-09-08T01:58:02Z)
A Dynamical Systems Approach for Convergence of the Bayesian EM Algorithm [59.99439951055238]
We show how (discrete-time) Lyapunov stability theory can serve as a powerful tool to aid, or even lead, in the analysis (and potential design) of optimization algorithms that are not necessarily gradient-based. The particular ML problem that this paper focuses on is that of parameter estimation in an incomplete-data Bayesian framework via the popular optimization algorithm known as maximum a posteriori expectation-maximization (MAP-EM) We show that fast convergence (linear or quadratic) is achieved, which could have been difficult to unveil without our adopted S&C approach.
arXiv Detail & Related papers (2020-06-23T01:34:18Z)
Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis [102.29671176698373]
We address the problem of policy evaluation in discounted decision processes, and provide Markov-dependent guarantees on the $ell_infty$error under a generative model. We establish both and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms.
arXiv Detail & Related papers (2020-03-16T17:15:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.