Globally Convergent Policy Search over Dynamic Filters for Output
Estimation
- URL: http://arxiv.org/abs/2202.11659v1
- Date: Wed, 23 Feb 2022 18:06:20 GMT
- Title: Globally Convergent Policy Search over Dynamic Filters for Output
Estimation
- Authors: Jack Umenberger, Max Simchowitz, Juan C. Perdomo, Kaiqing Zhang, Russ
Tedrake
- Abstract summary: We introduce the first direct policy search algorithm convex which converges to the globally optimal $textitdynamic$ filter.
We show that informativity overcomes the aforementioned degeneracy.
- Score: 64.90951294952094
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce the first direct policy search algorithm which provably
converges to the globally optimal $\textit{dynamic}$ filter for the classical
problem of predicting the outputs of a linear dynamical system, given noisy,
partial observations. Despite the ubiquity of partial observability in
practice, theoretical guarantees for direct policy search algorithms, one of
the backbones of modern reinforcement learning, have proven difficult to
achieve. This is primarily due to the degeneracies which arise when optimizing
over filters that maintain internal state.
In this paper, we provide a new perspective on this challenging problem based
on the notion of $\textit{informativity}$, which intuitively requires that all
components of a filter's internal state are representative of the true state of
the underlying dynamical system. We show that informativity overcomes the
aforementioned degeneracy. Specifically, we propose a $\textit{regularizer}$
which explicitly enforces informativity, and establish that gradient descent on
this regularized objective - combined with a ``reconditioning step'' -
converges to the globally optimal cost a $\mathcal{O}(1/T)$. Our analysis
relies on several new results which may be of independent interest, including a
new framework for analyzing non-convex gradient descent via convex
reformulation, and novel bounds on the solution to linear Lyapunov equations in
terms of (our quantitative measure of) informativity.
Related papers
- Model-Free $\mu$-Synthesis: A Nonsmooth Optimization Perspective [4.477225073240389]
In this paper, we revisit an important policy search benchmark, namely $mu$-synthesis.
We show that subgradient-based search methods have led to impressive numerical results in practice.
arXiv Detail & Related papers (2024-02-18T17:17:17Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Low-Switching Policy Gradient with Exploration via Online Sensitivity
Sampling [23.989009116398208]
We design a low-switching sample-efficient policy optimization algorithm, LPO, with general non-linear function approximation.
We show that, our algorithm obtains an $varepsilon$-optimal policy with only $widetildeO(fractextpoly(d)varepsilon3)$ samples.
arXiv Detail & Related papers (2023-06-15T23:51:46Z) - The Power of Learned Locally Linear Models for Nonlinear Policy
Optimization [26.45568696453259]
This paper conducts a rigorous analysis of a simplified variant of this strategy for general nonlinear systems.
We analyze an algorithm which iterates between estimating local linear models of nonlinear system dynamics and performing $mathttiLQR$-like policy updates.
arXiv Detail & Related papers (2023-05-16T17:13:00Z) - The Role of Baselines in Policy Gradient Optimization [83.42050606055822]
We show that the emphstate value baseline allows on-policy.
emphnatural policy gradient (NPG) to converge to a globally optimal.
policy at an $O (1/t) rate gradient.
We find that the primary effect of the value baseline is to textbfreduce the aggressiveness of the updates rather than their variance.
arXiv Detail & Related papers (2023-01-16T06:28:00Z) - Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes [99.26864533035454]
We study offline reinforcement learning (RL) in partially observable Markov decision processes.
We propose the underlineProxy variable underlinePessimistic underlinePolicy underlineOptimization (textttP3O) algorithm.
textttP3O is the first provably efficient offline RL algorithm for POMDPs with a confounded dataset.
arXiv Detail & Related papers (2022-05-26T19:13:55Z) - A general sample complexity analysis of vanilla policy gradient [101.16957584135767]
Policy gradient (PG) is one of the most popular reinforcement learning (RL) problems.
"vanilla" theoretical understanding of PG trajectory is one of the most popular methods for solving RL problems.
arXiv Detail & Related papers (2021-07-23T19:38:17Z) - Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which.
machines can independently compute gradients, and cooperate.
Our algorithm is based on a new concentration technique, and its sample complexity.
It is very practical: it improves upon the performance of all prior methods when no.
setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z) - Iteratively Reweighted Least Squares for $\ell_1$-minimization with
Global Linear Convergence Rate [0.0]
Iteratively Reweighted Least Squares (IRLS) represents an important family of algorithms for non-smooth optimization.
We prove that IRLS for $ell_$-minimization converges to a sparse solution with a global linear rate.
arXiv Detail & Related papers (2020-12-22T18:47:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.