On Uninformative Optimal Policies in Adaptive LQR with Unknown B-Matrix
- URL: http://arxiv.org/abs/2011.09288v3
- Date: Fri, 30 Apr 2021 05:37:21 GMT
- Title: On Uninformative Optimal Policies in Adaptive LQR with Unknown B-Matrix
- Authors: Ingvar Ziemann, Henrik Sandberg
- Abstract summary: Local minimax regret lower bounds for adaptive Linear Quadratic Regulators (LQR)
We consider affinely parametrized $B$-matrices and known $A$-matrices.
It is shown that if the parametrization induces an uninformative optimal policy, logarithmic regret is impossible.
- Score: 7.507288369705302
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper presents local asymptotic minimax regret lower bounds for adaptive
Linear Quadratic Regulators (LQR). We consider affinely parametrized
$B$-matrices and known $A$-matrices and aim to understand when logarithmic
regret is impossible even in the presence of structural side information. After
defining the intrinsic notion of an uninformative optimal policy in terms of a
singularity condition for Fisher information we obtain local minimax regret
lower bounds for such uninformative instances of LQR by appealing to van Trees'
inequality (Bayesian Cram\'er-Rao) and a representation of regret in terms of a
quadratic form (Bellman error). It is shown that if the parametrization induces
an uninformative optimal policy, logarithmic regret is impossible and the rate
is at least order square root in the time horizon. We explicitly characterize
the notion of an uninformative optimal policy in terms of the nullspaces of
system-theoretic quantities and the particular instance parametrization.
Related papers
- On the Hardness of Meaningful Local Guarantees in Nonsmooth Nonconvex Optimization [37.41427897807821]
We show the complexity of cryptographic nonknown regular optimization.
Local algorithms acting on Lipschitz functions cannot, in the worst case, provide meaningful local in terms of value in subexponma minima.
arXiv Detail & Related papers (2024-09-16T14:35:00Z) - Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage [100.8180383245813]
We propose value-based algorithms for offline reinforcement learning (RL)
We show an analogous result for vanilla Q-functions under a soft margin condition.
Our algorithms' loss functions arise from casting the estimation problems as nonlinear convex optimization problems and Lagrangifying.
arXiv Detail & Related papers (2023-02-05T14:22:41Z) - Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium
Learning from Offline Datasets [101.5329678997916]
We study episodic two-player zero-sum Markov games (MGs) in the offline setting.
The goal is to find an approximate Nash equilibrium (NE) policy pair based on a dataset collected a priori.
arXiv Detail & Related papers (2022-02-15T15:39:30Z) - Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds
for Episodic Reinforcement Learning [50.44564503645015]
We provide improved gap-dependent regret bounds for reinforcement learning in finite episodic Markov decision processes.
We prove tighter upper regret bounds for optimistic algorithms and accompany them with new information-theoretic lower bounds for a large class of MDPs.
arXiv Detail & Related papers (2021-07-02T20:36:05Z) - Rate-Distortion Analysis of Minimum Excess Risk in Bayesian Learning [15.544041797200045]
Minimum Excess Risk (MER) in Bayesian learning is defined as the difference between the minimum expected loss achievable when learning from data and the minimum expected loss that could be achieved if the underlying parameter $W$ was observed.
We derive information-theoretic bounds on the difference between these upper and lower bounds and show that they can provide order-wise tight rates for MER.
arXiv Detail & Related papers (2021-05-10T08:14:10Z) - Is Pessimism Provably Efficient for Offline RL? [104.00628430454479]
We study offline reinforcement learning (RL), which aims to learn an optimal policy based on a dataset collected a priori.
We propose a pessimistic variant of the value iteration algorithm (PEVI), which incorporates an uncertainty quantifier as the penalty function.
arXiv Detail & Related papers (2020-12-30T09:06:57Z) - High-Dimensional Sparse Linear Bandits [67.9378546011416]
We derive a novel $Omega(n2/3)$ dimension-free minimax regret lower bound for sparse linear bandits in the data-poor regime.
We also prove a dimension-free $O(sqrtn)$ regret upper bound under an additional assumption on the magnitude of the signal for relevant features.
arXiv Detail & Related papers (2020-11-08T16:48:11Z) - Relaxing the I.I.D. Assumption: Adaptively Minimax Optimal Regret via
Root-Entropic Regularization [16.536558038560695]
We consider prediction with expert advice when data are generated from varying arbitrarily within an unknown constraint set.
The Hedge algorithm was recently shown to be simultaneously minimax optimal for i.i.d. data.
We provide matching upper and lower bounds on the minimax regret at all levels, show that Hedge with deterministic learning rates is suboptimal outside of the extremes, and prove that one can adaptively obtain minimax regret at all levels.
arXiv Detail & Related papers (2020-07-13T17:54:34Z) - Fundamental Limits of Ridge-Regularized Empirical Risk Minimization in
High Dimensions [41.7567932118769]
Empirical Risk Minimization algorithms are widely used in a variety of estimation and prediction tasks.
In this paper, we characterize for the first time the fundamental limits on the statistical accuracy of convex ERM for inference.
arXiv Detail & Related papers (2020-06-16T04:27:38Z) - Support recovery and sup-norm convergence rates for sparse pivotal
estimation [79.13844065776928]
In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level.
We show minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators.
arXiv Detail & Related papers (2020-01-15T16:11:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.