Kernel Taylor-Based Value Function Approximation for Continuous-State
Markov Decision Processes
- URL: http://arxiv.org/abs/2006.02008v1
- Date: Wed, 3 Jun 2020 01:48:43 GMT
- Title: Kernel Taylor-Based Value Function Approximation for Continuous-State
Markov Decision Processes
- Authors: Junhong Xu, Kai Yin, Lantao Liu
- Abstract summary: We propose a principled kernel-based policy iteration algorithm to solve the continuous-state Markov Decision Processes (MDPs)
We have validated the proposed method through extensive simulations in both simplified and realistic planning scenarios.
- Score: 5.894659354028797
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a principled kernel-based policy iteration algorithm to solve the
continuous-state Markov Decision Processes (MDPs). In contrast to most
decision-theoretic planning frameworks, which assume fully known state
transition models, we design a method that eliminates such a strong assumption,
which is oftentimes extremely difficult to engineer in reality. To achieve
this, we first apply the second-order Taylor expansion of the value function.
The Bellman optimality equation is then approximated by a partial differential
equation, which only relies on the first and second moments of the transition
model. By combining the kernel representation of value function, we then design
an efficient policy iteration algorithm whose policy evaluation step can be
represented as a linear system of equations characterized by a finite set of
supporting states. We have validated the proposed method through extensive
simulations in both simplified and realistic planning scenarios, and the
experiments show that our proposed approach leads to a much superior
performance over several baseline methods.
Related papers
- Unraveling Rodeo Algorithm Through the Zeeman Model [0.0]
We unravel the Rodeo Algorithm to determine the eigenstates and eigenvalues spectrum for a general Hamiltonian considering arbitrary initial states.
We exploit Pennylane and Qiskit platforms resources to analyze scenarios where the Hamiltonians are described by the Zeeman model for one and two spins.
arXiv Detail & Related papers (2024-07-16T01:29:25Z) - Double Duality: Variational Primal-Dual Policy Optimization for
Constrained Reinforcement Learning [132.7040981721302]
We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure.
Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
arXiv Detail & Related papers (2024-02-16T16:35:18Z) - Efficient Global Planning in Large MDPs via Stochastic Primal-Dual
Optimization [12.411844611718958]
We show that our method outputs a near-optimal policy after a number of queries to the generative model.
Our method is computationally efficient and comes with the major advantage that it outputs a single softmax policy that is compactly represented by a low-dimensional parameter vector.
arXiv Detail & Related papers (2022-10-21T15:49:20Z) - Structural Estimation of Markov Decision Processes in High-Dimensional
State Space with Finite-Time Guarantees [39.287388288477096]
We consider the task of estimating a structural model of dynamic decisions by a human agent based upon the observable history of implemented actions and visited states.
This problem has an inherent nested structure: in the inner problem, an optimal policy for a given reward function is identified while in the outer problem, a measure of fit is maximized.
We propose a single-loop estimation algorithm with finite time guarantees that is equipped to deal with high-dimensional state spaces.
arXiv Detail & Related papers (2022-10-04T00:11:38Z) - Primal-dual regression approach for Markov decision processes with
general state and action space [0.30458514384586394]
We develop a regression based primal-dual martingale approach for solving finite time MDPs with general state and action space.
As a result, our method allows for the construction of tight upper and lower biased approximations of the value functions, and, provides tight approximations to the optimal policy.
arXiv Detail & Related papers (2022-10-01T11:48:22Z) - Making Linear MDPs Practical via Contrastive Representation Learning [101.75885788118131]
It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations.
We consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning.
We demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.
arXiv Detail & Related papers (2022-07-14T18:18:02Z) - Average-Reward Off-Policy Policy Evaluation with Function Approximation [66.67075551933438]
We consider off-policy policy evaluation with function approximation in average-reward MDPs.
bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly triad.
We propose two novel algorithms, reproducing the celebrated success of Gradient TD algorithms in the average-reward setting.
arXiv Detail & Related papers (2021-01-08T00:43:04Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - On the implementation of a global optimization method for mixed-variable
problems [0.30458514384586394]
The algorithm is based on the radial basis function of Gutmann and the metric response surface method of Regis and Shoemaker.
We propose several modifications aimed at generalizing and improving these two algorithms.
arXiv Detail & Related papers (2020-09-04T13:36:56Z) - Conditional gradient methods for stochastically constrained convex
minimization [54.53786593679331]
We propose two novel conditional gradient-based methods for solving structured convex optimization problems.
The most important feature of our framework is that only a subset of the constraints is processed at each iteration.
Our algorithms rely on variance reduction and smoothing used in conjunction with conditional gradient steps, and are accompanied by rigorous convergence guarantees.
arXiv Detail & Related papers (2020-07-07T21:26:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.