CP-MDP: A CANDECOMP-PARAFAC Decomposition Approach to Solve a Markov
Decision Process Multidimensional Problem
- URL: http://arxiv.org/abs/2103.00331v1
- Date: Sat, 27 Feb 2021 21:33:19 GMT
- Title: CP-MDP: A CANDECOMP-PARAFAC Decomposition Approach to Solve a Markov
Decision Process Multidimensional Problem
- Authors: Daniela Kuinchtner, Afonso Sales, Felipe Meneguzzi
- Abstract summary: We develop an MDP solver for a multidimensional problem using a tensor decomposition method.
We show that our approach can compute much larger problems using substantially less memory.
- Score: 21.79259092920586
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Markov Decision Process (MDP) is the underlying model for optimal planning
for decision-theoretic agents in stochastic environments. Although much
research focuses on solving MDP problems both in tabular form or using factored
representations, none focused on tensor decomposition methods. Solving MDPs
using tensor algebra offers the prospect of leveraging advances in tensor-based
computations to further increase solver efficiency. In this paper, we develop
an MDP solver for a multidimensional problem using a tensor decomposition
method to compress the transition models and optimize the value iteration and
policy iteration algorithms. We empirically evaluate our approach against
tabular methods and show our approach can compute much larger problems using
substantially less memory, opening up new possibilities for tensor-based
approaches in stochastic planning
Related papers
- Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis [30.713243690224207]
In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents' preferences for certain outcomes.
This paper proposes a new Q-learning algorithm for quantile optimization in MDPs with strong convergence and performance guarantees.
arXiv Detail & Related papers (2024-10-31T16:53:20Z) - Feature selection in linear SVMs via hard cardinality constraint: a scalable SDP decomposition approach [3.7876216422538485]
We study the embedded feature selection problem in linear Support Vector Machines (SVMs)
A cardinality constraint is employed, leading to a fully explainable selection model.
The problem is NP-hard due to the presence of the cardinality constraint.
arXiv Detail & Related papers (2024-04-15T19:15:32Z) - Numerical Methods for Convex Multistage Stochastic Optimization [86.45244607927732]
We focus on optimisation programming (SP), Optimal Control (SOC) and Decision Processes (MDP)
Recent progress in solving convex multistage Markov problems is based on cutting planes approximations of the cost-to-go functions of dynamic programming equations.
Cutting plane type methods can handle multistage problems with a large number of stages, but a relatively smaller number of state (decision) variables.
arXiv Detail & Related papers (2023-03-28T01:30:40Z) - Multistage Stochastic Optimization via Kernels [3.7565501074323224]
We develop a non-parametric, data-driven, tractable approach for solving multistage optimization problems.
We show that the proposed method produces decision rules with near-optimal average performance.
arXiv Detail & Related papers (2023-03-11T23:19:32Z) - Robust Markov Decision Processes without Model Estimation [32.16801929347098]
There are two major barriers to applying robust MDPs in practice.
First, most works study robust MDPs in a model-based regime.
Second, prior work typically assumes a strong oracle to obtain the optimal solution.
arXiv Detail & Related papers (2023-02-02T17:29:10Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Parallel Stochastic Mirror Descent for MDPs [72.75921150912556]
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs)
Some variant of Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals.
We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method.
arXiv Detail & Related papers (2021-02-27T19:28:39Z) - Efficient semidefinite-programming-based inference for binary and
multi-class MRFs [83.09715052229782]
We propose an efficient method for computing the partition function or MAP estimate in a pairwise MRF.
We extend semidefinite relaxations from the typical binary MRF to the full multi-class setting, and develop a compact semidefinite relaxation that can again be solved efficiently using the solver.
arXiv Detail & Related papers (2020-12-04T15:36:29Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - Partial Policy Iteration for L1-Robust Markov Decision Processes [13.555107578858307]
This paper describes new efficient algorithms for solving the common class of robust MDPs.
We propose partial policy iteration, a new, efficient, flexible, and general policy iteration scheme for robust MDPs.
Our experimental results indicate that the proposed methods are many orders of magnitude faster than the state-of-the-art approach.
arXiv Detail & Related papers (2020-06-16T19:50:14Z) - GACEM: Generalized Autoregressive Cross Entropy Method for Multi-Modal
Black Box Constraint Satisfaction [69.94831587339539]
We present a modified Cross-Entropy Method (CEM) that uses a masked auto-regressive neural network for modeling uniform distributions over the solution space.
Our algorithm is able to express complicated solution spaces, thus allowing it to track a variety of different solution regions.
arXiv Detail & Related papers (2020-02-17T20:21:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.