Fugu-MT 論文翻訳(概要): Online Learning with Bounded Recall

論文の概要: Online Learning with Bounded Recall

arxiv url: http://arxiv.org/abs/2205.14519v2
Date: Fri, 31 May 2024 19:55:56 GMT
ステータス: 翻訳完了
システム内更新日: 2024-06-04 23:55:24.645320
Title: Online Learning with Bounded Recall
Title（参考訳）: 境界リコールによるオンライン学習
Authors: Jon Schneider, Kiran Vodrahalli,
Abstract要約: 本研究では,繰り返しゲーム研究に人気がある「バウンド・リコール」環境において,オンライン学習の完全情報化の課題について検討する。オンライン学習アルゴリズム $mathcalA$ が$M$-$textitbounded-recall$ であるとき、$t$ の出力が$M$以前の報酬の関数として記述できる。
参考スコア（独自算出の注目度）: 11.046741824529107
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We study the problem of full-information online learning in the "bounded recall" setting popular in the study of repeated games. An online learning algorithm $\mathcal{A}$ is $M$-$\textit{bounded-recall}$ if its output at time $t$ can be written as a function of the $M$ previous rewards (and not e.g. any other internal state of $\mathcal{A}$). We first demonstrate that a natural approach to constructing bounded-recall algorithms from mean-based no-regret learning algorithms (e.g., running Hedge over the last $M$ rounds) fails, and that any such algorithm incurs constant regret per round. We then construct a stationary bounded-recall algorithm that achieves a per-round regret of $\Theta(1/\sqrt{M})$, which we complement with a tight lower bound. Finally, we show that unlike the perfect recall setting, any low regret bound bounded-recall algorithm must be aware of the ordering of the past $M$ losses -- any bounded-recall algorithm which plays a symmetric function of the past $M$ losses must incur constant regret per round.
Abstract（参考訳）: 本研究では,繰り返しゲーム研究に人気がある「バウンド・リコール」環境において,オンライン学習の完全情報化の課題について検討する。オンライン学習アルゴリズムの $\mathcal{A}$ が $M$-$\textit{bounded-recall}$ であるとき、その出力が $t$ が $M$ 以前の報酬の関数として記述できる($\mathcal{A}$ の他の内部状態は eg ではない)。我々はまず、平均に基づく非回帰学習アルゴリズム(例えば、過去$M$ラウンドでHedgeを実行する)から有界リコールアルゴリズムを構築するための自然なアプローチが失敗し、そのようなアルゴリズムがラウンド毎に絶え間ない後悔を引き起こすことを実証した。すると、我々は、厳密な下界を補うような$\Theta(1/\sqrt{M})$の1周あたりの後悔を実現する定常的有界リコールアルゴリズムを構築する。最後に、完全なリコール設定とは異なり、任意の低遅延有界リコールアルゴリズムは、過去の$M$損失の順序に気付いていなければならない -- 過去の$M$損失の対称関数を実行する任意の有界リコールアルゴリズムは、ラウンド毎に一定の後悔を起こさなければならない。

関連論文リスト

Near-Optimal Algorithms for Omniprediction [6.874077229518565]
オンライン設定とオフライン設定の両方で、オムニプレディションのためのほぼ最適学習アルゴリズムを提供します。オンライン学習アルゴリズムは、様々な尺度でほぼ最適な複雑さを実現する。オフライン学習アルゴリズムは効率的な$(mathcalL_mathrmBV,mathcalH,varepsilon(m)$)を返す
論文参考訳（メタデータ） (2025-01-28T02:58:37Z)
Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs [72.40181882916089]
我々のアルゴリズムが $tildeObig((d+log (|mathcalS|2 |mathcalA|))sqrtKbig)$ regret with full-information feedback, where $d$ is the dimension of a known feature mapping is linearly parametrizing the unknown transition kernel of the MDP, $K$ is the number of episodes, $|mathcalS|$ and $|mathcalA|$ is the standardities of the state and action space。
論文参考訳（メタデータ） (2023-05-15T05:37:32Z)
Improved Kernel Alignment Regret Bound for Online Kernel Learning [11.201662566974232]
提案手法は, 既往の結果よりも, 計算量や計算量が多くなるアルゴリズムを提案する。核行列の固有値が指数関数的に減衰すると、我々のアルゴリズムは$O(sqrtmathcalA_T)$の後悔を、$O(ln2T)$の計算複雑性で楽しむ。我々はアルゴリズムをバッチ学習に拡張し、以前の$Oを改善した$O(frac1TsqrtmathbbE[mathcalA_T])$over risk boundを得る。
論文参考訳（メタデータ） (2022-12-26T02:32:20Z)
Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits [99.86860277006318]
本稿では,一組の逆アルゴリズムを組み合わせ,学習することの問題点について考察する。 Agarwal et al. の CORRAL はこの目標を、$widetildeO(sqrtd S T)$ の残酷なオーバーヘッドで達成している。この問題に触発されて、後悔のオーバーヘッドが百万ドルにしか依存しない大規模バンディットアルゴリズムのバンドを囲む新しいレシピを提案する。
論文参考訳（メタデータ） (2022-02-12T21:55:44Z)
Logarithmic Regret from Sublinear Hints [76.87432703516942]
自然クエリモデルにより,アルゴリズムが$O(log T)$ regretsを$O(sqrtT)$ hintsで得ることを示す。また、$o(sqrtT)$ hintsは$Omega(sqrtT)$ regretより保証できないことも示しています。
論文参考訳（メタデータ） (2021-11-09T16:50:18Z)
Contextual Recommendations and Low-Regret Cutting-Plane Algorithms [49.91214213074933]
本稿では、ナビゲーションエンジンやレコメンデーションシステムにおけるルーティングアプリケーションによって動機付けられた、コンテキスト線形帯域の次の変種について考察する。我々は、真の点$w*$と分離オラクルが返す超平面の間の全距離を、低い「回帰」を持つ新しい切断平面アルゴリズムを設計する。
論文参考訳（メタデータ） (2021-06-09T05:39:05Z)
Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
我々は,バンディットフィードバックを用いてオンライン学習を学習する。 learnerは、コスト/リワード関数が"pseudo-1d"構造を許可するゼロ次オラクルのみにアクセスできる。我々は、$T$がラウンドの数である任意のアルゴリズムの後悔のために$min(sqrtdT、T3/4)$の下限を示しています。ランダム化オンライングラデーション下降とカーネル化指数重み法を組み合わせた新しいアルゴリズムsbcalgを提案し,疑似-1d構造を効果的に活用する。
論文参考訳（メタデータ） (2021-02-15T08:16:51Z)
Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
我々は,損失予測の助けを借りて,文脈的包帯における学習を研究する。最適な後悔は$mathcalO(minsqrtT, sqrtmathcalETfrac13)$である。
論文参考訳（メタデータ） (2020-03-04T07:36:38Z)
Adaptive Online Learning with Varying Norms [45.11667443216861]
オンライン凸最適化アルゴリズムは、あるドメインで$w_t$を出力する。この結果を用いて新しい「完全行列」型後悔境界を得る。
論文参考訳（メタデータ） (2020-02-10T17:22:08Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。