An Adaptive State Aggregation Algorithm for Markov Decision Processes
- URL: http://arxiv.org/abs/2107.11053v1
- Date: Fri, 23 Jul 2021 07:19:43 GMT
- Title: An Adaptive State Aggregation Algorithm for Markov Decision Processes
- Authors: Guanting Chen, Johann Demetrio Gaebler, Matt Peng, Chunlin Sun, Yinyu
Ye
- Abstract summary: We propose an intuitive algorithm for solving MDPs that reduces the cost of value iteration updates by dynamically grouping together states with similar cost-to-go values.
Our algorithm converges almost surely to within (2varepsilon / (1 - gamma) of the true optimal value in the (ellinfty) norm, where (gamma) is the discount factor and aggregated states differ by at most (varepsilon)
- Score: 10.494611365482028
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Value iteration is a well-known method of solving Markov Decision Processes
(MDPs) that is simple to implement and boasts strong theoretical convergence
guarantees. However, the computational cost of value iteration quickly becomes
infeasible as the size of the state space increases. Various methods have been
proposed to overcome this issue for value iteration in large state and action
space MDPs, often at the price, however, of generalizability and algorithmic
simplicity. In this paper, we propose an intuitive algorithm for solving MDPs
that reduces the cost of value iteration updates by dynamically grouping
together states with similar cost-to-go values. We also prove that our
algorithm converges almost surely to within \(2\varepsilon / (1 - \gamma)\) of
the true optimal value in the \(\ell^\infty\) norm, where \(\gamma\) is the
discount factor and aggregated states differ by at most \(\varepsilon\).
Numerical experiments on a variety of simulated environments confirm the
robustness of our algorithm and its ability to solve MDPs with much cheaper
updates especially as the scale of the MDP problem increases.
Related papers
- Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation [53.17668583030862]
We study infinite-horizon average-reward Markov decision processes (AMDPs) in the context of general function approximation.
We propose a novel algorithmic framework named Local-fitted Optimization with OPtimism (LOOP)
We show that LOOP achieves a sublinear $tildemathcalO(mathrmpoly(d, mathrmsp(V*)) sqrtTbeta )$ regret, where $d$ and $beta$ correspond to AGEC and log-covering number of the hypothesis class respectively
arXiv Detail & Related papers (2024-04-19T06:24:22Z) - Fast, Scalable, Warm-Start Semidefinite Programming with Spectral
Bundling and Sketching [53.91395791840179]
We present Unified Spectral Bundling with Sketching (USBS), a provably correct, fast and scalable algorithm for solving massive SDPs.
USBS provides a 500x speed-up over the state-of-the-art scalable SDP solver on an instance with over 2 billion decision variables.
arXiv Detail & Related papers (2023-12-19T02:27:22Z) - Non-stationary Reinforcement Learning under General Function
Approximation [60.430936031067006]
We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs.
Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA.
We show that SW-OPEA is provably efficient as long as the variation budget is not significantly large.
arXiv Detail & Related papers (2023-06-01T16:19:37Z) - Twice Regularized Markov Decision Processes: The Equivalence between
Robustness and Regularization [64.60253456266872]
Markov decision processes (MDPs) aim to handle changing or partially known system dynamics.
Regularized MDPs show more stability in policy learning without impairing time complexity.
Bellman operators enable us to derive planning and learning schemes with convergence and generalization guarantees.
arXiv Detail & Related papers (2023-03-12T13:03:28Z) - Structural Estimation of Markov Decision Processes in High-Dimensional
State Space with Finite-Time Guarantees [39.287388288477096]
We consider the task of estimating a structural model of dynamic decisions by a human agent based upon the observable history of implemented actions and visited states.
This problem has an inherent nested structure: in the inner problem, an optimal policy for a given reward function is identified while in the outer problem, a measure of fit is maximized.
We propose a single-loop estimation algorithm with finite time guarantees that is equipped to deal with high-dimensional state spaces.
arXiv Detail & Related papers (2022-10-04T00:11:38Z) - Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic Algorithm for Constrained Markov Decision Processes [13.908826484332282]
We study an online primal-dual actor-critic method to solve a discounted cost constrained Markov decision process problem.
This paper is the first to study the finite-time complexity of an online primal-dual actor-critic method for solving a CMDP problem.
arXiv Detail & Related papers (2021-10-21T18:05:40Z) - Parallel Stochastic Mirror Descent for MDPs [72.75921150912556]
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs)
Some variant of Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals.
We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method.
arXiv Detail & Related papers (2021-02-27T19:28:39Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Partial Policy Iteration for L1-Robust Markov Decision Processes [13.555107578858307]
This paper describes new efficient algorithms for solving the common class of robust MDPs.
We propose partial policy iteration, a new, efficient, flexible, and general policy iteration scheme for robust MDPs.
Our experimental results indicate that the proposed methods are many orders of magnitude faster than the state-of-the-art approach.
arXiv Detail & Related papers (2020-06-16T19:50:14Z) - Scalable First-Order Methods for Robust MDPs [31.29341288414507]
This paper proposes the first first-order framework for solving robust MDPs.
By carefully controlling the tradeoff between the accuracy and cost of Value Iteration updates, we achieve an ergodic convergence rate of $O left( A2 S3log(S)log(epsilon-1) epsilon-1 right)$
arXiv Detail & Related papers (2020-05-11T21:06:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.