Fugu-MT 論文翻訳(概要): A single algorithm for both restless and rested rotting bandits

論文の概要: A single algorithm for both restless and rested rotting bandits

arxiv url: http://arxiv.org/abs/2604.21432v1
Date: Thu, 23 Apr 2026 08:48:52 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-24 14:40:06.392229
Title: A single algorithm for both restless and rested rotting bandits
Title（参考訳）: レストレスおよびレストローティングバンディットのための1つのアルゴリズム
Authors: Julien Seznec, Pierre Ménard, Alessandro Lazaric, Michal Valko,
Abstract要約: 本稿では,ロッティング・アダプティブ・ウインドウUCBというアルゴリズムを導入し,ロッティング・レストとレスト・バンディットの両面において,ほぼ最適の後悔を実現する。これは、報酬が増加するとアルゴリズムが同様の結果を得ることができないことを示す以前の否定的な結果とは対照的である。
参考スコア（独自算出の注目度）: 65.63283411693489
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In many application domains (e.g., recommender systems, intelligent tutoring systems), the rewards associated to the actions tend to decrease over time. This decay is either caused by the actions executed in the past (e.g., a user may get bored when songs of the same genre are recommended over and over) or by an external factor (e.g., content becomes outdated). These two situations can be modeled as specific instances of the rested and restless bandit settings, where arms are rotting (i.e., their value decrease over time). These problems were thought to be significantly different, since Levine et al. (2017) showed that state-of-the-art algorithms for restless bandit perform poorly in the rested rotting setting. In this paper, we introduce a novel algorithm, Rotting Adaptive Window UCB (RAW-UCB), that achieves near-optimal regret in both rotting rested and restless bandit, without any prior knowledge of the setting (rested or restless) and the type of non-stationarity (e.g., piece-wise constant, bounded variation). This is in striking contrast with previous negative results showing that no algorithm can achieve similar results as soon as rewards are allowed to increase. We confirm our theoretical findings on a number of synthetic and dataset-based experiments.
Abstract（参考訳）: 多くのアプリケーション・ドメイン(例えばレコメンデータ・システム、インテリジェント・チュータリング・システム)では、アクションに関連する報酬は時間の経過とともに減少する傾向にある。この崩壊は、過去に実行された動作(例えば、同じジャンルの曲が何度も推奨されると退屈になる)や外部要因(例えば、コンテンツが時代遅れになる)によって引き起こされる。これらの2つの状況は、腕が腐っている(つまり、その値は時間とともに減少する)休息状態と休息状態のバンディット設定の特定の例としてモデル化することができる。 Levine et al (2017) は、レストレスバンディットのための最先端のアルゴリズムは、レストローティング環境では性能が良くないことを示した。本稿では,ロッティング・アダプティブ・ウィンドウ(RAW-UCB)という新しいアルゴリズムを導入し,ロッティング・レストとレスト・バンディットの双方において,設定(レスト・レスト・レスト)と非定常性(例えば,部分的定数,有界変動)について事前の知識を伴わずに,ほぼ最適の後悔を実現する。これは、報酬が増加するとアルゴリズムが同様の結果を得ることができないことを示す以前の否定的な結果とは対照的である。我々は,多くの合成およびデータセットに基づく実験に関する理論的知見を確認した。

論文の概要: A single algorithm for both restless and rested rotting bandits

関連論文リスト