Fugu-MT 論文翻訳(概要): Online Inverse Linear Optimization: Improved Regret Bound, Robustness to Suboptimality, and Toward Tight Regret Analysis

論文の概要: Online Inverse Linear Optimization: Improved Regret Bound, Robustness to Suboptimality, and Toward Tight Regret Analysis

arxiv url: http://arxiv.org/abs/2501.14349v4
Date: Thu, 13 Feb 2025 07:05:51 GMT
ステータス: 翻訳完了
システム内更新日: 2025-02-14 15:38:06.065418
Title: Online Inverse Linear Optimization: Improved Regret Bound, Robustness to Suboptimality, and Toward Tight Regret Analysis
Title（参考訳）: オンライン逆線形最適化:regret境界の改善、suboptimalityへのロバスト性、およびTight Regret解析に向けて
Authors: Shinsaku Sakaue, Taira Tsuchiya, Han Bao, Taihei Oki,
Abstract要約: 本稿では,学習者が時間変化の可能な行動群とエージェントの最適な行動群の両方を観察するオンライン学習問題について検討する。我々は、以前の$O(n4ln T)$の限界を$n3$の係数で改善した$O(nln T)$後悔境界を得る。
参考スコア（独自算出の注目度）: 25.50155563108198
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study an online learning problem where, over $T$ rounds, a learner observes both time-varying sets of feasible actions and an agent's optimal actions, selected by solving linear optimization over the feasible actions. The learner sequentially makes predictions of the agent's underlying linear objective function, and their quality is measured by the regret, the cumulative gap between optimal objective values and those achieved by following the learner's predictions. A seminal work by B\"armann et al. (ICML 2017) showed that online learning methods can be applied to this problem to achieve regret bounds of $O(\sqrt{T})$. Recently, Besbes et al. (COLT 2021, Oper. Res. 2023) significantly improved the result by achieving an $O(n^4\ln T)$ regret bound, where $n$ is the dimension of the ambient space of objective vectors. Their method, based on the ellipsoid method, runs in polynomial time but is inefficient for large $n$ and $T$. In this paper, we obtain an $O(n\ln T)$ regret bound, improving upon the previous bound of $O(n^4\ln T)$ by a factor of $n^3$. Our method is simple and efficient: we apply the online Newton step (ONS) to appropriate exp-concave loss functions. Moreover, for the case where the agent's actions are possibly suboptimal, we establish an $O(n\ln T+\sqrt{\Delta_Tn\ln T})$ regret bound, where $\Delta_T$ is the cumulative suboptimality of the agent's actions. This bound is achieved by using MetaGrad, which runs ONS with $\Theta(\ln T)$ different learning rates in parallel. We also provide a simple instance that implies an $\Omega(n)$ lower bound, showing that our $O(n\ln T)$ bound is tight up to an $O(\ln T)$ factor. This gives rise to a natural question: can the $O(\ln T)$ factor in the upper bound be removed? For the special case of $n=2$, we show that an $O(1)$ regret bound is possible, while we delineate challenges in extending this result to higher dimensions.
Abstract（参考訳）: そこで,学習者は,実行可能行動の時間変化と,実行可能行動の線形最適化によって選択されたエージェントの最適行動の両方を,T$以上のラウンドで学習者が観察するオンライン学習問題について検討する。学習者は、エージェントの根底にある線形目的関数の予測を逐次行い、その品質は、後悔、最適な目的値と学習者の予測に従うことによって達成されるものとの累積的ギャップによって測定される。 B\"armann et al (ICML 2017) によるセミナー研究は、オンライン学習手法をこの問題に適用して、$O(\sqrt{T})$の後悔の限界を達成できることを示した。最近、Besbes et al (COLT 2021, Os. 2023) は、目標ベクトルの周囲空間の次元である$O(n^4\ln T)$ regret bound を達成することによって、結果を著しく改善した。これらの手法は楕円体法に基づいて多項式時間で実行されるが、大きな$n$と$T$では非効率である。本稿では、以前の$O(n^4\ln T)$の有界を$n^3$の係数で改善した$O(n\ln T)$後悔境界を得る。我々はオンラインニュートンステップ(ONS)を適切なexp-concave損失関数に適用する。さらに、エージェントのアクションが亜最適である可能性がある場合は、$O(n\ln T+\sqrt{\Delta_Tn\ln T})$ regret bound, ここで、$\Delta_T$はエージェントのアクションの累積的準最適である。このバウンダリはMetaGradを使用して実現され、OnSを$\Theta(\ln T)$の異なる学習レートで並列に実行する。また、$O(n\ln T)$ bound が $O(n\ln T)$ factor まで固であることを示す、$Omega(n)$ lower bound を示す単純な例も提供します。上界の$O(\ln T)$ factorは取り除けるか? n=2$ の特別の場合、$O(1)$ 後悔境界が可能である一方で、この結果をより高次元に拡張する際の課題を明記する。

論文の概要: Online Inverse Linear Optimization: Improved Regret Bound, Robustness to Suboptimality, and Toward Tight Regret Analysis

関連論文リスト