Fugu-MT 論文翻訳(概要): History-Restricted Online Learning

論文の概要: History-Restricted Online Learning

arxiv url: http://arxiv.org/abs/2205.14519v1
Date: Sat, 28 May 2022 20:52:52 GMT
ステータス: 翻訳完了
システム内更新日: 2022-05-31 15:24:37.703883
Title: History-Restricted Online Learning
Title（参考訳）: 履歴制限付きオンライン学習
Authors: Jon Schneider and Kiran Vodrahalli
Abstract要約: 本稿では,履歴制限のないオンライン学習アルゴリズムについて紹介する。オンライン学習アルゴリズム $mathcalA$ が$M$-history-restricted であるとは、その時点での出力 $t$ が$M$以前の報酬の関数として書けることである。
参考スコア（独自算出の注目度）: 15.564663451885371
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We introduce the concept of history-restricted no-regret online learning algorithms. An online learning algorithm $\mathcal{A}$ is $M$-history-restricted if its output at time $t$ can be written as a function of the $M$ previous rewards. This class of online learning algorithms is quite natural to consider from many perspectives: they may be better models of human agents and they do not store long-term information (thereby ensuring ``the right to be forgotten''). We first demonstrate that a natural approach to constructing history-restricted algorithms from mean-based no-regret learning algorithms (e.g. running Hedge over the last $M$ rounds) fails, and that such algorithms incur linear regret. We then construct a history-restricted algorithm that achieves a per-round regret of $\Theta(1/\sqrt{M})$, which we complement with a tight lower bound. Finally, we empirically explore distributions where history-restricted online learners have favorable performance compared to other no-regret algorithms.
Abstract（参考訳）: 本稿では,歴史に制約のないオンライン学習アルゴリズムの概念を紹介する。オンライン学習アルゴリズム $\mathcal{A}$ が$M$-history-restricted であるとは、その時の出力 $t$ が$M$以前の報酬の関数として書けることをいう。このタイプのオンライン学習アルゴリズムは、多くの観点から考えると、非常に自然である: それらは人間のエージェントのより良いモデルであり、長期的な情報を保存しないかもしれない。まず,平均ベース非レグレット学習アルゴリズム(例えば,過去$m$ラウンドにわたってヘッジを実行する)から履歴制限付きアルゴリズムを構築するための自然なアプローチが失敗し,そのようなアルゴリズムが線形後悔を引き起こすことを実証する。そして、履歴制限付きアルゴリズムを構築し、1回あたりの後悔が$\theta(1/\sqrt{m})$となるようにします。最後に、履歴制限のあるオンライン学習者が他の非学習アルゴリズムと比較して好適な性能を持つ分布を実証的に探求する。

関連論文リスト

Near-Optimal Algorithms for Omniprediction [6.874077229518565]
オンライン設定とオフライン設定の両方で、オムニプレディションのためのほぼ最適学習アルゴリズムを提供します。オンライン学習アルゴリズムは、様々な尺度でほぼ最適な複雑さを実現する。オフライン学習アルゴリズムは効率的な$(mathcalL_mathrmBV,mathcalH,varepsilon(m)$)を返す
論文参考訳（メタデータ） (2025-01-28T02:58:37Z)
Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs [72.40181882916089]
我々のアルゴリズムが $tildeObig((d+log (|mathcalS|2 |mathcalA|))sqrtKbig)$ regret with full-information feedback, where $d$ is the dimension of a known feature mapping is linearly parametrizing the unknown transition kernel of the MDP, $K$ is the number of episodes, $|mathcalS|$ and $|mathcalA|$ is the standardities of the state and action space。
論文参考訳（メタデータ） (2023-05-15T05:37:32Z)
Improved Kernel Alignment Regret Bound for Online Kernel Learning [11.201662566974232]
提案手法は, 既往の結果よりも, 計算量や計算量が多くなるアルゴリズムを提案する。核行列の固有値が指数関数的に減衰すると、我々のアルゴリズムは$O(sqrtmathcalA_T)$の後悔を、$O(ln2T)$の計算複雑性で楽しむ。我々はアルゴリズムをバッチ学習に拡張し、以前の$Oを改善した$O(frac1TsqrtmathbbE[mathcalA_T])$over risk boundを得る。
論文参考訳（メタデータ） (2022-12-26T02:32:20Z)
Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits [99.86860277006318]
本稿では,一組の逆アルゴリズムを組み合わせ,学習することの問題点について考察する。 Agarwal et al. の CORRAL はこの目標を、$widetildeO(sqrtd S T)$ の残酷なオーバーヘッドで達成している。この問題に触発されて、後悔のオーバーヘッドが百万ドルにしか依存しない大規模バンディットアルゴリズムのバンドを囲む新しいレシピを提案する。
論文参考訳（メタデータ） (2022-02-12T21:55:44Z)
Logarithmic Regret from Sublinear Hints [76.87432703516942]
自然クエリモデルにより,アルゴリズムが$O(log T)$ regretsを$O(sqrtT)$ hintsで得ることを示す。また、$o(sqrtT)$ hintsは$Omega(sqrtT)$ regretより保証できないことも示しています。
論文参考訳（メタデータ） (2021-11-09T16:50:18Z)
Contextual Recommendations and Low-Regret Cutting-Plane Algorithms [49.91214213074933]
本稿では、ナビゲーションエンジンやレコメンデーションシステムにおけるルーティングアプリケーションによって動機付けられた、コンテキスト線形帯域の次の変種について考察する。我々は、真の点$w*$と分離オラクルが返す超平面の間の全距離を、低い「回帰」を持つ新しい切断平面アルゴリズムを設計する。
論文参考訳（メタデータ） (2021-06-09T05:39:05Z)
Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
我々は,バンディットフィードバックを用いてオンライン学習を学習する。 learnerは、コスト/リワード関数が"pseudo-1d"構造を許可するゼロ次オラクルのみにアクセスできる。我々は、$T$がラウンドの数である任意のアルゴリズムの後悔のために$min(sqrtdT、T3/4)$の下限を示しています。ランダム化オンライングラデーション下降とカーネル化指数重み法を組み合わせた新しいアルゴリズムsbcalgを提案し,疑似-1d構造を効果的に活用する。
論文参考訳（メタデータ） (2021-02-15T08:16:51Z)
Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
我々は,損失予測の助けを借りて,文脈的包帯における学習を研究する。最適な後悔は$mathcalO(minsqrtT, sqrtmathcalETfrac13)$である。
論文参考訳（メタデータ） (2020-03-04T07:36:38Z)
Adaptive Online Learning with Varying Norms [45.11667443216861]
オンライン凸最適化アルゴリズムは、あるドメインで$w_t$を出力する。この結果を用いて新しい「完全行列」型後悔境界を得る。
論文参考訳（メタデータ） (2020-02-10T17:22:08Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。