Fugu-MT 論文翻訳(概要): Minimal Expected Regret in Linear Quadratic Control

論文の概要: Minimal Expected Regret in Linear Quadratic Control

arxiv url: http://arxiv.org/abs/2109.14429v1
Date: Wed, 29 Sep 2021 14:07:21 GMT
ステータス: 翻訳完了
システム内更新日: 2021-09-30 14:55:17.036213
Title: Minimal Expected Regret in Linear Quadratic Control
Title（参考訳）: 線形二次制御における最小期待後悔
Authors: Yassir Jedra, Alexandre Proutiere
Abstract要約: オンライン学習アルゴリズムを考案し、その期待された後悔を保証します。当時のこの後悔は、$A$と$B$が未知の場合、$widetildeO((d_u+d_x)sqrtd_xT)$によって上界(i)となる。
参考スコア（独自算出の注目度）: 79.81807680370677
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of online learning in Linear Quadratic Control systems whose state transition and state-action transition matrices $A$ and $B$ may be initially unknown. We devise an online learning algorithm and provide guarantees on its expected regret. This regret at time $T$ is upper bounded (i) by $\widetilde{O}((d_u+d_x)\sqrt{d_xT})$ when $A$ and $B$ are unknown, (ii) by $\widetilde{O}(d_x^2\log(T))$ if only $A$ is unknown, and (iii) by $\widetilde{O}(d_x(d_u+d_x)\log(T))$ if only $B$ is unknown and under some mild non-degeneracy condition ($d_x$ and $d_u$ denote the dimensions of the state and of the control input, respectively). These regret scalings are minimal in $T$, $d_x$ and $d_u$ as they match existing lower bounds in scenario (i) when $d_x\le d_u$ [SF20], and in scenario (ii) [lai1986]. We conjecture that our upper bounds are also optimal in scenario (iii) (there is no known lower bound in this setting). Existing online algorithms proceed in epochs of (typically exponentially) growing durations. The control policy is fixed within each epoch, which considerably simplifies the analysis of the estimation error on $A$ and $B$ and hence of the regret. Our algorithm departs from this design choice: it is a simple variant of certainty-equivalence regulators, where the estimates of $A$ and $B$ and the resulting control policy can be updated as frequently as we wish, possibly at every step. Quantifying the impact of such a constantly-varying control policy on the performance of these estimates and on the regret constitutes one of the technical challenges tackled in this paper.
Abstract（参考訳）: 状態遷移および状態-作用遷移行列が$A$および$B$である線形二次制御系におけるオンライン学習の問題について考察する。オンライン学習アルゴリズムを考案し、その期待する後悔の保証を提供する。この後悔は時給$t$ は上限より上です (i) by $\widetilde{o}((d_u+d_x)\sqrt{d_xt})$ when $a$ と $b$ は未知である。 (ii) by $\widetilde{O}(d_x^2\log(T))$ if only $A$ is unknown, and (iii) by $\widetilde{O}(d_x(d_u+d_x)\log(T))$ if $B$が未知で、ある穏やかな非退化条件下では$d_x$と$d_u$はそれぞれ状態と制御入力の次元を表す。これらの残念なスケーリングは、シナリオにおける既存の下位境界と一致するため、$T$, $d_x$, $d_u$で最小限である (i)$d_x\le d_u$ [SF20] の場合、シナリオ (ii) [lai1986] 我々の上界もシナリオで最適だと推測します (三)(この設定では下限は知られていない) 既存のオンラインアルゴリズムは、(典型的には指数関数的に)成長期間のエポックで進行する。制御ポリシーは、各エポック内で固定され、$A$と$B$における推定誤差の分析をかなり単純化する。このアルゴリズムは、A$とB$の推定値と結果の制御ポリシーを、私たちが望むように、おそらくすべてのステップで、頻繁に更新できるような、確実な等価性規制の単純な変種である。このような一定変化の制御ポリシがこれらの見積のパフォーマンスに与える影響の定量化と,その後悔は,本稿で取り組んだ技術的課題の1つである。

論文の概要: Minimal Expected Regret in Linear Quadratic Control

関連論文リスト