The Geometry of Memoryless Stochastic Policy Optimization in
Infinite-Horizon POMDPs
- URL: http://arxiv.org/abs/2110.07409v2
- Date: Fri, 15 Oct 2021 13:34:42 GMT
- Title: The Geometry of Memoryless Stochastic Policy Optimization in
Infinite-Horizon POMDPs
- Authors: Johannes M\"uller, Guido Mont\'ufar
- Abstract summary: We consider the problem of finding the best memoryless policy for an infinite-horizon partially observable decision process.
We show that the discounted state-action frequencies and the expected cumulative reward are the functions of the policy, whereby the degree is determined by the degree of partial observability.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of finding the best memoryless stochastic policy for
an infinite-horizon partially observable Markov decision process (POMDP) with
finite state and action spaces with respect to either the discounted or mean
reward criterion. We show that the (discounted) state-action frequencies and
the expected cumulative reward are rational functions of the policy, whereby
the degree is determined by the degree of partial observability. We then
describe the optimization problem as a linear optimization problem in the space
of feasible state-action frequencies subject to polynomial constraints that we
characterize explicitly. This allows us to address the combinatorial and
geometric complexity of the optimization problem using recent tools from
polynomial optimization. In particular, we demonstrate how the partial
observability constraints can lead to multiple smooth and non-smooth local
optimizers and we estimate the number of critical points.
Related papers
- Structured Regularization for Constrained Optimization on the SPD Manifold [1.1126342180866644]
We introduce a class of structured regularizers, based on symmetric gauge functions, which allow for solving constrained optimization on the SPD manifold with faster unconstrained methods.
We show that our structured regularizers can be chosen to preserve or induce desirable structure, in particular convexity and "difference of convex" structure.
arXiv Detail & Related papers (2024-10-12T22:11:22Z) - Double Duality: Variational Primal-Dual Policy Optimization for
Constrained Reinforcement Learning [132.7040981721302]
We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure.
Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
arXiv Detail & Related papers (2024-02-16T16:35:18Z) - Optimizing Chance-Constrained Submodular Problems with Variable
Uncertainties [12.095075636344536]
We study chance-constrained submodular optimization problems, which capture a wide range of problems with constraints.
We present greedy algorithms that can obtain a high-quality solution, i.e., a constant approximation ratio to the given optimal solution.
arXiv Detail & Related papers (2023-09-23T04:48:49Z) - Multistage Stochastic Optimization via Kernels [3.7565501074323224]
We develop a non-parametric, data-driven, tractable approach for solving multistage optimization problems.
We show that the proposed method produces decision rules with near-optimal average performance.
arXiv Detail & Related papers (2023-03-11T23:19:32Z) - Solving infinite-horizon POMDPs with memoryless stochastic policies in
state-action space [0.0]
Reward optimization in fully observable Markov decision processes is equivalent to a linear program over the polytope of state-action frequencies.
We present an approach for Reward Optimization in State-Action space (ROSA)
We find that ROSA is computationally efficient and can improvements over other existing methods.
arXiv Detail & Related papers (2022-05-27T16:56:59Z) - Non-Convex Optimization with Certificates and Fast Rates Through Kernel
Sums of Squares [68.8204255655161]
We consider potentially non- optimized approximation problems.
In this paper, we propose an algorithm that achieves close to optimal a priori computational guarantees.
arXiv Detail & Related papers (2022-04-11T09:37:04Z) - Faster Algorithm and Sharper Analysis for Constrained Markov Decision
Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints.
A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual.
This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z) - Optimization on manifolds: A symplectic approach [127.54402681305629]
We propose a dissipative extension of Dirac's theory of constrained Hamiltonian systems as a general framework for solving optimization problems.
Our class of (accelerated) algorithms are not only simple and efficient but also applicable to a broad range of contexts.
arXiv Detail & Related papers (2021-07-23T13:43:34Z) - Optimal oracle inequalities for solving projected fixed-point equations [53.31620399640334]
We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space.
We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation.
arXiv Detail & Related papers (2020-12-09T20:19:32Z) - Sparse Signal Reconstruction for Nonlinear Models via Piecewise Rational
Optimization [27.080837460030583]
We propose a method to reconstruct degraded signals by a nonlinear distortion and at a limited sampling rate.
Our method formulates as a non exact fitting term and a penalization term.
It is shown how to use the problem in terms of the benefits of our simulations.
arXiv Detail & Related papers (2020-10-29T09:05:19Z) - Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis [102.29671176698373]
We address the problem of policy evaluation in discounted decision processes, and provide Markov-dependent guarantees on the $ell_infty$error under a generative model.
We establish both and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms.
arXiv Detail & Related papers (2020-03-16T17:15:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.