Fugu-MT 論文翻訳(概要): Black-Box Control for Linear Dynamical Systems

論文の概要: Black-Box Control for Linear Dynamical Systems

arxiv url: http://arxiv.org/abs/2007.06650v3
Date: Wed, 17 Feb 2021 22:10:12 GMT
ステータス: 翻訳完了
システム内更新日: 2022-11-10 23:05:44.108994
Title: Black-Box Control for Linear Dynamical Systems
Title（参考訳）: 線形力学系のブラックボックス制御
Authors: Xinyi Chen, Elad Hazan
Abstract要約: ブラックボックス相互作用の単一連鎖から未知の線形時間不変力学系を制御する問題を考える。システムが制御可能であるという仮定の下で、サブ線形後悔を達成できる最初の効率的なアルゴリズムを与える。
参考スコア（独自算出の注目度）: 40.352938608995174
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of controlling an unknown linear time-invariant dynamical system from a single chain of black-box interactions, with no access to resets or offline simulation. Under the assumption that the system is controllable, we give the first efficient algorithm that is capable of attaining sublinear regret in a single trajectory under the setting of online nonstochastic control. This resolves an open problem on the stochastic LQR problem, and in a more challenging setting that allows for adversarial perturbations and adversarially chosen and changing convex loss functions. We give finite-time regret bounds for our algorithm on the order of $2^{\tilde{O}(\mathcal{L})} + \tilde{O}(\text{poly}(\mathcal{L}) T^{2/3})$ for general nonstochastic control, and $2^{\tilde{O}(\mathcal{L})} + \tilde{O}(\text{poly}(\mathcal{L}) \sqrt{T})$ for black-box LQR, where $\mathcal{L}$ is the system size which is an upper bound on the dimension. The crucial step is a new system identification method that is robust to adversarial noise, but incurs exponential cost. To complete the picture, we investigate the complexity of the online black-box control problem, and give a matching lower bound of $2^{\Omega(\mathcal{L})}$ on the regret, showing that the additional exponential cost is inevitable. This lower bound holds even in the noiseless setting, and applies to any, randomized or deterministic, black-box control method.
Abstract（参考訳）: 我々は,リセットやオフラインシミュレーションを行わずに,一列のブラックボックス相互作用から未知の線形時間不変力学系を制御する問題を考える。このシステムが制御可能であると仮定すると、オンライン非確率制御の設定の下で単一の軌道でsublinear regretを実現することができる最初の効率的なアルゴリズムを与える。これは確率的LQR問題の解法であり、対向的摂動と対向的選択と凸損失関数の変更を可能にするより困難な設定である。 2^{\tilde{o}(\mathcal{l})} + \tilde{o}(\text{poly}(\mathcal{l}) t^{2/3})$ for general nonstochastic control, and $2^{\tilde{o}(\mathcal{l})} + \tilde{o}(\text{poly}(\mathcal{l}) \sqrt{t})$ for black-box lqr,ただし $\mathcal{l}$ は次元上の上限である。重要なステップは、対向雑音に対して頑丈だが指数的なコストを発生させる新しいシステム識別法である。そこで本研究では,オンラインのブラックボックス制御問題の複雑性を調査し,それと一致する2^{\omega(\mathcal{l})}$の低限値を与え,追加の指数的コストが避けられないことを示す。この下限はノイズのない設定でも保持され、任意のランダム化または決定論的ブラックボックス制御方法に適用される。

関連論文リスト

Refined Regret for Adversarial MDPs with Linear Function Approximation [50.00022394876222]
我々は,損失関数が約1,300ドル以上のエピソードに対して任意に変化するような,敵対的決定過程(MDP)の学習を検討する。本稿では,同じ設定で$tildemathcal O(K2/3)$に対する後悔を改善する2つのアルゴリズムを提案する。
論文参考訳（メタデータ） (2023-01-30T14:37:21Z)
Optimal Dynamic Regret in LQR Control [23.91519151164528]
我々は、LQR制御という2次的損失の連続を伴う非確率的制御の問題を考察する。我々は、$tildeO(textmaxn1/3 MathcalTV(M_1:n)2/3, 1)$の最適動的(政治的)後悔を実現するオンラインアルゴリズムを提供する。
論文参考訳（メタデータ） (2022-06-18T18:00:21Z)
Dynamic Regret Minimization for Control of Non-stationary Linear Dynamical Systems [18.783925692307054]
本稿では,$tildemathcalO(sqrtST)$を最適にリセットするアルゴリズムを提案する。本アルゴリズムの要点は適応的非定常性検出戦略であり,最近開発されたコンテキスト多重武装バンドイット問題に対するアプローチに基づいている。
論文参考訳（メタデータ） (2021-11-06T01:30:51Z)
Finite-time System Identification and Adaptive Control in Autoregressive Exogenous Systems [79.67879934935661]
未知のARXシステムのシステム識別と適応制御の問題について検討する。我々は,オープンループとクローズループの両方のデータ収集の下で,ARXシステムに対する有限時間学習保証を提供する。
論文参考訳（メタデータ） (2021-08-26T18:00:00Z)
Efficient Optimistic Exploration in Linear-Quadratic Regulators via Lagrangian Relaxation [107.06364966905821]
線形2次レギュレータ(LQR)設定における探索・探索ジレンマについて検討した。有限 MDP に対する楽観的アルゴリズムで用いられる拡張値反復アルゴリズムに着想を得て,Oulq の楽観的最適化を緩和することを提案する。我々は、少なくとも$Obig(log (1/epsilon)big)$ Riccati方程式を解くことで、$epsilon$-OptimisticControllerを効率的に計算できることを示した。
論文参考訳（メタデータ） (2020-07-13T16:30:47Z)
Logarithmic Regret for Adversarial Online Control [56.12283443161479]
対数的後悔を伴う最初のアルゴリズムを任意対数外乱列に対して与える。我々のアルゴリズムと分析はオフライン制御法の特徴を利用してオンライン制御問題を(遅延)オンライン学習に還元する。
論文参考訳（メタデータ） (2020-02-29T06:29:19Z)
Naive Exploration is Optimal for Online LQR [49.681825576239355]
最適後悔尺度は$widetildeTheta(sqrtd_mathbfu2 d_mathbfx T)$で、$T$は時間ステップの数、$d_mathbfu$は入力空間の次元、$d_mathbfx$はシステム状態の次元である。我々の下界は、かつての$mathrmpoly(logT)$-regretアルゴリズムの可能性を排除する。
論文参考訳（メタデータ） (2020-01-27T03:44:54Z)
Improper Learning for Non-Stochastic Control [78.65807250350755]
逆方向の摂動, 逆方向に選択された凸損失関数, 部分的に観察された状態を含む, 未知の線形力学系を制御することの問題点を考察する。このパラメトリゼーションにオンライン降下を適用することで、大規模なクローズドループポリシーに対してサブリニア後悔を実現する新しいコントローラが得られる。我々の境界は、線形力学コントローラの安定化と競合する非確率的制御設定における最初のものである。
論文参考訳（メタデータ） (2020-01-25T02:12:48Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。