Fugu-MT 論文翻訳(概要): Nonasymptotic Regret Analysis of Adaptive Linear Quadratic Control with Model Misspecification

論文の概要: Nonasymptotic Regret Analysis of Adaptive Linear Quadratic Control with Model Misspecification

arxiv url: http://arxiv.org/abs/2401.00073v1
Date: Fri, 29 Dec 2023 21:06:37 GMT
ステータス: 翻訳完了
システム内更新日: 2024-01-03 19:06:24.574364
Title: Nonasymptotic Regret Analysis of Adaptive Linear Quadratic Control with Model Misspecification
Title（参考訳）: モデルミス種別を用いた適応線形二次制御の漸近回帰解析
Authors: Bruce D. Lee, Anders Rantzer, Nikolai Matni
Abstract要約: 本研究では,学習者が基礎行列の集合について事前知識を持つ設定において,適応線形二次制御問題について検討する。この基礎は、基礎となるデータ生成プロセスのダイナミックスを完全に表現できないという意味で、誤解されている。本稿では,この先行知識を用いたアルゴリズムを提案し,システムとのT$相互作用の後に期待される後悔の上限を証明した。
参考スコア（独自算出の注目度）: 4.9531406053444265
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The strategy of pre-training a large model on a diverse dataset, then fine-tuning for a particular application has yielded impressive results in computer vision, natural language processing, and robotic control. This strategy has vast potential in adaptive control, where it is necessary to rapidly adapt to changing conditions with limited data. Toward concretely understanding the benefit of pre-training for adaptive control, we study the adaptive linear quadratic control problem in the setting where the learner has prior knowledge of a collection of basis matrices for the dynamics. This basis is misspecified in the sense that it cannot perfectly represent the dynamics of the underlying data generating process. We propose an algorithm that uses this prior knowledge, and prove upper bounds on the expected regret after $T$ interactions with the system. In the regime where $T$ is small, the upper bounds are dominated by a term scales with either $\texttt{poly}(\log T)$ or $\sqrt{T}$, depending on the prior knowledge available to the learner. When $T$ is large, the regret is dominated by a term that grows with $\delta T$, where $\delta$ quantifies the level of misspecification. This linear term arises due to the inability to perfectly estimate the underlying dynamics using the misspecified basis, and is therefore unavoidable unless the basis matrices are also adapted online. However, it only dominates for large $T$, after the sublinear terms arising due to the error in estimating the weights for the basis matrices become negligible. We provide simulations that validate our analysis. Our simulations also show that offline data from a collection of related systems can be used as part of a pre-training stage to estimate a misspecified dynamics basis, which is in turn used by our adaptive controller.
Abstract（参考訳）: 多様なデータセット上で大きなモデルを事前学習し、特定のアプリケーション用に微調整するという戦略は、コンピュータビジョン、自然言語処理、ロボット制御に素晴らしい結果をもたらした。この戦略は適応制御において大きな可能性を秘めており、限られたデータで変化する条件に迅速に適応する必要がある。適応制御のための事前学習の利点を具体的に理解するために,学習者がダイナミクスのための基底行列の集合の事前知識を有する場合の適応線形二次制御問題について検討する。この根拠は、基盤となるデータ生成プロセスのダイナミクスを完全に表現できないという意味では不明確である。先行する知識を用いて,システムと$t$インタラクションを行った後,期待する後悔の上限を証明できるアルゴリズムを提案する。 t$ が小さいレジームでは、上限は、学習者が利用可能な事前知識に応じて、$\texttt{poly}(\log t)$ または $\sqrt{t}$ の項スケールによって支配される。 t$ が大きければ、後悔は$\delta t$ で成長する言葉によって支配され、$\delta$ は誤特定のレベルを定量化する。この線形項は、不特定の基底を用いて基礎となる力学を完全に推定できないため、基底行列がオンラインでも適用されない限り避けられない。しかし、基底行列の重みを推定する誤りによって生じる部分線型項が無視できるようになった後、大きな t$ に対してのみ支配的である。我々は解析を検証するシミュレーションを提供する。また,本シミュレーションでは,関連システムの集合からのオフラインデータを事前学習段階の一部として使用して,適応制御器で使用される不特定なダイナミクスベースを推定する。

論文の概要: Nonasymptotic Regret Analysis of Adaptive Linear Quadratic Control with Model Misspecification

関連論文リスト