Fugu-MT 論文翻訳(概要): Naive Exploration is Optimal for Online LQR

論文の概要: Naive Exploration is Optimal for Online LQR

arxiv url: http://arxiv.org/abs/2001.09576v4
Date: Wed, 4 Oct 2023 02:22:31 GMT
ステータス: 翻訳完了
システム内更新日: 2023-10-05 23:02:17.665811
Title: Naive Exploration is Optimal for Online LQR
Title（参考訳）: Naive ExplorationはオンラインLQRに最適
Authors: Max Simchowitz, Dylan J. Foster
Abstract要約: 最適後悔尺度は$widetildeTheta(sqrtd_mathbfu2 d_mathbfx T)$で、$T$は時間ステップの数、$d_mathbfu$は入力空間の次元、$d_mathbfx$はシステム状態の次元である。我々の下界は、かつての$mathrmpoly(logT)$-regretアルゴリズムの可能性を排除する。
参考スコア（独自算出の注目度）: 49.681825576239355
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of online adaptive control of the linear quadratic regulator, where the true system parameters are unknown. We prove new upper and lower bounds demonstrating that the optimal regret scales as $\widetilde{\Theta}({\sqrt{d_{\mathbf{u}}^2 d_{\mathbf{x}} T}})$, where $T$ is the number of time steps, $d_{\mathbf{u}}$ is the dimension of the input space, and $d_{\mathbf{x}}$ is the dimension of the system state. Notably, our lower bounds rule out the possibility of a $\mathrm{poly}(\log{}T)$-regret algorithm, which had been conjectured due to the apparent strong convexity of the problem. Our upper bound is attained by a simple variant of $\textit{{certainty equivalent control}}$, where the learner selects control inputs according to the optimal controller for their estimate of the system while injecting exploratory random noise. While this approach was shown to achieve $\sqrt{T}$-regret by (Mania et al. 2019), we show that if the learner continually refines their estimates of the system matrices, the method attains optimal dimension dependence as well. Central to our upper and lower bounds is a new approach for controlling perturbations of Riccati equations called the $\textit{self-bounding ODE method}$, which we use to derive suboptimality bounds for the certainty equivalent controller synthesized from estimated system dynamics. This in turn enables regret upper bounds which hold for $\textit{any stabilizable instance}$ and scale with natural control-theoretic quantities.
Abstract（参考訳）: 本稿では,真のシステムパラメータが不明な線形二次制御器のオンライン適応制御の問題について考察する。ここで、$t$ は時間ステップの数、$d_{\mathbf{u}}$ は入力空間の次元、$d_{\mathbf{x}}$ はシステム状態の次元である。特に、我々の下界は、問題の明らかな強い凸性のために予想されていた$\mathrm{poly}(\log{}T)$-regretアルゴリズムの可能性を除外している。この上界は$\textit{{certainty equivalent control}}$という単純な変種によって達成され、学習者は探索的ランダムノイズを注入しながらシステムの推定のために最適コントローラに従って制御入力を選択する。このアプローチは, (Mania et al. 2019) によって$\sqrt{T}$-regretを達成することが示されているが, 学習者が連続的にシステム行列の推定を洗練すれば, 最適次元依存性も達成できることを示す。上界と下界の中心は$\textit{self-bounding ODE method}$と呼ばれるリカティ方程式の摂動を制御するための新しいアプローチであり、推定系力学から合成された一定の等価コントローラーに対する準最適境界を導出する。これにより、$\textit{any stabilizable instance}$を保ち、自然制御理論量でスケールする後悔の上界が可能になる。

関連論文リスト

Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path [80.60592344361073]
線形混合遷移カーネルを用いた最短経路(SSP)問題について検討する。エージェントは繰り返し環境と対話し、累積コストを最小化しながら特定の目標状態に到達する。既存の作業は、イテレーションコスト関数の厳密な下限や、最適ポリシーに対する期待長の上限を仮定することが多い。
論文参考訳（メタデータ） (2024-02-14T07:52:00Z)
Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems [61.85150061213987]
一般化線形モデル (GLM) フレームワークを用いて, citelu2021low で提案した一般化低ランク行列帯域問題について検討する。既存のアルゴリズムの計算不可能性と理論的制約を克服するため,まずG-ESTTフレームワークを提案する。 G-ESTT は $tildeO(sqrt(d_1+d_2)3/2Mr3/2T)$ bound of regret を達成でき、G-ESTS は $tildeO を達成できることを示す。
論文参考訳（メタデータ） (2024-01-14T14:14:19Z)
Dynamic Regret Minimization for Control of Non-stationary Linear Dynamical Systems [18.783925692307054]
本稿では,$tildemathcalO(sqrtST)$を最適にリセットするアルゴリズムを提案する。本アルゴリズムの要点は適応的非定常性検出戦略であり,最近開発されたコンテキスト多重武装バンドイット問題に対するアプローチに基づいている。
論文参考訳（メタデータ） (2021-11-06T01:30:51Z)
Minimal Expected Regret in Linear Quadratic Control [79.81807680370677]
オンライン学習アルゴリズムを考案し、その期待された後悔を保証します。当時のこの後悔は、$A$と$B$が未知の場合、$widetildeO((d_u+d_x)sqrtd_xT)$によって上界(i)となる。
論文参考訳（メタデータ） (2021-09-29T14:07:21Z)
Regret Lower Bound and Optimal Algorithm for High-Dimensional Contextual Linear Bandit [10.604939762790517]
我々は、累積後悔に対して、$mathcalObig((log d)fracalpha+12Tfrac1-alpha2+log Tbig)$をミニマックス下界として証明する。第2に,汎用的なアッパー信頼境界(UCB)戦略に着想を得た,シンプルで効率的なアルゴリズムを提案する。
論文参考訳（メタデータ） (2021-09-23T19:35:38Z)
Geometric Exploration for Online Control [38.87811800375421]
本研究では,一般的な凸コスト下での線形力学系の制御について検討する。目的は、障害フィードバックコントローラのクラスに対する後悔を最小限にすることである。
論文参考訳（メタデータ） (2020-10-25T18:11:28Z)
Black-Box Control for Linear Dynamical Systems [40.352938608995174]
ブラックボックス相互作用の単一連鎖から未知の線形時間不変力学系を制御する問題を考える。システムが制御可能であるという仮定の下で、サブ線形後悔を達成できる最初の効率的なアルゴリズムを与える。
論文参考訳（メタデータ） (2020-07-13T19:43:19Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
制約付き文脈線形帯域設定について検討し、エージェントの目標は一連のポリシーを作成することである。楽観的悲観的線形帯域(OPLB)と呼ばれる,この問題に対する高信頼束縛アルゴリズムを提案する。
論文参考訳（メタデータ） (2020-06-17T22:32:19Z)
Improper Learning for Non-Stochastic Control [78.65807250350755]
逆方向の摂動, 逆方向に選択された凸損失関数, 部分的に観察された状態を含む, 未知の線形力学系を制御することの問題点を考察する。このパラメトリゼーションにオンライン降下を適用することで、大規模なクローズドループポリシーに対してサブリニア後悔を実現する新しいコントローラが得られる。我々の境界は、線形力学コントローラの安定化と競合する非確率的制御設定における最初のものである。
論文参考訳（メタデータ） (2020-01-25T02:12:48Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。