Kernel Taylor-Based Value Function Approximation for Continuous-State
Markov Decision Processes
- URL: http://arxiv.org/abs/2006.02008v1
- Date: Wed, 3 Jun 2020 01:48:43 GMT
- Title: Kernel Taylor-Based Value Function Approximation for Continuous-State
Markov Decision Processes
- Authors: Junhong Xu, Kai Yin, Lantao Liu
- Abstract summary: We propose a principled kernel-based policy iteration algorithm to solve the continuous-state Markov Decision Processes (MDPs)
We have validated the proposed method through extensive simulations in both simplified and realistic planning scenarios.
- Score: 5.894659354028797
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a principled kernel-based policy iteration algorithm to solve the
continuous-state Markov Decision Processes (MDPs). In contrast to most
decision-theoretic planning frameworks, which assume fully known state
transition models, we design a method that eliminates such a strong assumption,
which is oftentimes extremely difficult to engineer in reality. To achieve
this, we first apply the second-order Taylor expansion of the value function.
The Bellman optimality equation is then approximated by a partial differential
equation, which only relies on the first and second moments of the transition
model. By combining the kernel representation of value function, we then design
an efficient policy iteration algorithm whose policy evaluation step can be
represented as a linear system of equations characterized by a finite set of
supporting states. We have validated the proposed method through extensive
simulations in both simplified and realistic planning scenarios, and the
experiments show that our proposed approach leads to a much superior
performance over several baseline methods.
Related papers
- Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points.
To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject.
To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z) - Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs [82.34567890576423]
We develop a deterministic policy gradient primal-dual method to find an optimal deterministic policy with non-asymptotic convergence.
We prove that the primal-dual iterates of D-PGPD converge at a sub-linear rate to an optimal regularized primal-dual pair.
To the best of our knowledge, this appears to be the first work that proposes a deterministic policy search method for continuous-space constrained MDPs.
arXiv Detail & Related papers (2024-08-19T14:11:04Z) - Double Duality: Variational Primal-Dual Policy Optimization for
Constrained Reinforcement Learning [132.7040981721302]
We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure.
Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
arXiv Detail & Related papers (2024-02-16T16:35:18Z) - Regularized Q-Learning with Linear Function Approximation [2.765106384328772]
We consider a bi-level optimization formulation of regularized Q-learning with linear functional approximation.
We show that, under certain assumptions, the proposed algorithm converges to a stationary point in the presence of Markovian noise.
arXiv Detail & Related papers (2024-01-26T20:45:40Z) - Efficient Global Planning in Large MDPs via Stochastic Primal-Dual
Optimization [12.411844611718958]
We show that our method outputs a near-optimal policy after a number of queries to the generative model.
Our method is computationally efficient and comes with the major advantage that it outputs a single softmax policy that is compactly represented by a low-dimensional parameter vector.
arXiv Detail & Related papers (2022-10-21T15:49:20Z) - Structural Estimation of Markov Decision Processes in High-Dimensional
State Space with Finite-Time Guarantees [39.287388288477096]
We consider the task of estimating a structural model of dynamic decisions by a human agent based upon the observable history of implemented actions and visited states.
This problem has an inherent nested structure: in the inner problem, an optimal policy for a given reward function is identified while in the outer problem, a measure of fit is maximized.
We propose a single-loop estimation algorithm with finite time guarantees that is equipped to deal with high-dimensional state spaces.
arXiv Detail & Related papers (2022-10-04T00:11:38Z) - Primal-dual regression approach for Markov decision processes with
general state and action space [0.30458514384586394]
We develop a regression based primal-dual martingale approach for solving finite time MDPs with general state and action space.
As a result, our method allows for the construction of tight upper and lower biased approximations of the value functions, and, provides tight approximations to the optimal policy.
arXiv Detail & Related papers (2022-10-01T11:48:22Z) - Making Linear MDPs Practical via Contrastive Representation Learning [101.75885788118131]
It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations.
We consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning.
We demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.
arXiv Detail & Related papers (2022-07-14T18:18:02Z) - Average-Reward Off-Policy Policy Evaluation with Function Approximation [66.67075551933438]
We consider off-policy policy evaluation with function approximation in average-reward MDPs.
bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly triad.
We propose two novel algorithms, reproducing the celebrated success of Gradient TD algorithms in the average-reward setting.
arXiv Detail & Related papers (2021-01-08T00:43:04Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - On the implementation of a global optimization method for mixed-variable
problems [0.30458514384586394]
The algorithm is based on the radial basis function of Gutmann and the metric response surface method of Regis and Shoemaker.
We propose several modifications aimed at generalizing and improving these two algorithms.
arXiv Detail & Related papers (2020-09-04T13:36:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.