Finite-Time Analysis of Natural Actor-Critic for POMDPs
- URL: http://arxiv.org/abs/2202.09753v3
- Date: Wed, 19 Jul 2023 14:55:17 GMT
- Title: Finite-Time Analysis of Natural Actor-Critic for POMDPs
- Authors: Semih Cayci, Niao He, R. Srikant
- Abstract summary: We consider the reinforcement learning problem for partially observed Markov decision processes (POMDPs) with large or even countably infinite state spaces.
We consider a natural actor-critic method that employs a finite internal memory for policy parameterization.
We show that this error can be made small in the case of sliding-block controllers by using larger block sizes.
- Score: 29.978816372127085
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the reinforcement learning problem for partially observed Markov
decision processes (POMDPs) with large or even countably infinite state spaces,
where the controller has access to only noisy observations of the underlying
controlled Markov chain. We consider a natural actor-critic method that employs
a finite internal memory for policy parameterization, and a multi-step temporal
difference learning algorithm for policy evaluation. We establish, to the best
of our knowledge, the first non-asymptotic global convergence of actor-critic
methods for partially observed systems under function approximation. In
particular, in addition to the function approximation and statistical errors
that also arise in MDPs, we explicitly characterize the error due to the use of
finite-state controllers. This additional error is stated in terms of the total
variation distance between the traditional belief state in POMDPs and the
posterior distribution of the hidden state when using a finite-state
controller. Further, we show that this error can be made small in the case of
sliding-block controllers by using larger block sizes.
Related papers
- Efficient and Sharp Off-Policy Evaluation in Robust Markov Decision Processes [44.974100402600165]
We study the evaluation of a policy best-parametric and worst-case perturbations to a decision process (MDP)
We use transition observations from the original MDP, whether they are generated under the same or a different policy.
Our estimator is also estimated statistical inference using Wald confidence intervals.
arXiv Detail & Related papers (2024-03-29T18:11:49Z) - Robust Control for Dynamical Systems With Non-Gaussian Noise via Formal
Abstractions [59.605246463200736]
We present a novel controller synthesis method that does not rely on any explicit representation of the noise distributions.
First, we abstract the continuous control system into a finite-state model that captures noise by probabilistic transitions between discrete states.
We use state-of-the-art verification techniques to provide guarantees on the interval Markov decision process and compute a controller for which these guarantees carry over to the original control system.
arXiv Detail & Related papers (2023-01-04T10:40:30Z) - Robust and Adaptive Temporal-Difference Learning Using An Ensemble of
Gaussian Processes [70.80716221080118]
The paper takes a generative perspective on policy evaluation via temporal-difference (TD) learning.
The OS-GPTD approach is developed to estimate the value function for a given policy by observing a sequence of state-reward pairs.
To alleviate the limited expressiveness associated with a single fixed kernel, a weighted ensemble (E) of GP priors is employed to yield an alternative scheme.
arXiv Detail & Related papers (2021-12-01T23:15:09Z) - Sampling-Based Robust Control of Autonomous Systems with Non-Gaussian
Noise [59.47042225257565]
We present a novel planning method that does not rely on any explicit representation of the noise distributions.
First, we abstract the continuous system into a discrete-state model that captures noise by probabilistic transitions between states.
We capture these bounds in the transition probability intervals of a so-called interval Markov decision process (iMDP)
arXiv Detail & Related papers (2021-10-25T06:18:55Z) - Correct-by-construction reach-avoid control of partially observable
linear stochastic systems [7.912008109232803]
We formalize a robust feedback controller for reach-avoid control of discrete-time, linear time-invariant systems.
The problem is to compute a controller that satisfies the required provestate abstraction problem.
arXiv Detail & Related papers (2021-03-03T13:46:52Z) - Parallel Stochastic Mirror Descent for MDPs [72.75921150912556]
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs)
Some variant of Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals.
We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method.
arXiv Detail & Related papers (2021-02-27T19:28:39Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - Near Optimality of Finite Memory Feedback Policies in Partially Observed
Markov Decision Processes [0.0]
We study a planning problem for POMDPs where the system dynamics and measurement channel model is assumed to be known.
We find optimal policies for the approximate belief model under mild non-linear filter stability conditions.
We also establish a rate of convergence result which relates the finite window memory size and the approximation error bound.
arXiv Detail & Related papers (2020-10-15T00:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.