Fugu-MT 論文翻訳(概要): Data-Dependent Regret Bounds for Constrained MABs

論文の概要: Data-Dependent Regret Bounds for Constrained MABs

arxiv url: http://arxiv.org/abs/2505.20010v1
Date: Mon, 26 May 2025 14:00:36 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-27 16:58:43.486301
Title: Data-Dependent Regret Bounds for Constrained MABs
Title（参考訳）: 制約MABに対するデータ依存回帰境界
Authors: Gianmarco Genalti, Francesco Emanuele Stradi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti,
Abstract要約: 本稿では,制約付きMAB設定におけるデータ依存的後悔境界の研究を開始する。データ依存の後悔境界は制約の存在によって引き起こせるのか? 具体的には、制約のある最も困難で自然な設定に重点を置いています。
参考スコア（独自算出の注目度）: 36.19086479548864
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper initiates the study of data-dependent regret bounds in constrained MAB settings. These bounds depend on the sequence of losses that characterize the problem instance. Thus, they can be much smaller than classical $\widetilde{\mathcal{O}}(\sqrt{T})$ regret bounds, while being equivalent to them in the worst case. Despite this, data-dependent regret bounds have been completely overlooked in constrained MAB settings. The goal of this paper is to answer the following question: Can data-dependent regret bounds be derived in the presence of constraints? We answer this question affirmatively in constrained MABs with adversarial losses and stochastic constraints. Specifically, our main focus is on the most challenging and natural settings with hard constraints, where the learner must ensure that the constraints are always satisfied with high probability. We design an algorithm with a regret bound consisting of two data-dependent terms. The first term captures the difficulty of satisfying the constraints, while the second one encodes the complexity of learning independently of the presence of constraints. We also prove a lower bound showing that these two terms are not artifacts of our specific approach and analysis, but rather the fundamental components that inherently characterize the complexities of the problem. Finally, in designing our algorithm, we also derive some novel results in the related (and easier) soft constraints settings, which may be of independent interest.
Abstract（参考訳）: 本稿では,制約付きMAB設定におけるデータ依存的後悔境界の研究を開始する。これらの境界は、問題インスタンスを特徴づける損失の順序に依存する。したがって、それらは古典的な $\widetilde{\mathcal{O}}(\sqrt{T})$ regret bounds よりもはるかに小さいが、最悪の場合はそれと同値である。それにもかかわらず、データ依存の後悔境界は、制約されたMAB設定で完全に見過ごされている。本論文の目的は,次のような疑問に答えることである。データ依存の後悔境界は,制約の存在下で導出できるのだろうか? 我々は,対向的損失と確率的制約を有する制約付きMABにおいて,この疑問に肯定的に答える。具体的には,制約が常に高い確率で満たされていることを学習者が保証しなければならない,制約のある最も困難で自然な設定に重点を置いています。 2つのデータ依存項からなる残差境界を持つアルゴリズムを設計する。第一項は制約を満たすことの難しさを捉え、第二項は制約の存在から独立して学習の複雑さを符号化する。また、これらの2項は我々の特定のアプローチや分析の成果物ではなく、問題の複雑さを本質的に特徴づける基本成分であることを示す下界も証明している。最後に、アルゴリズムの設計において、関連する(かつ容易な)ソフト制約設定において、いくつかの新しい結果が導出される。

関連論文リスト

Beyond Primal-Dual Methods in Bandits with Stochastic and Adversarial Constraints [29.514323697659613]
我々は,学習者が任意の長期制約を満たすことなく報酬を最大化することを目的とした,knapsacks問題によるバンディットの一般化に対処する。私たちのゴールは、双方の制約の下で機能するベスト・オブ・ザ・ワールドのアルゴリズムを設計することです。
論文参考訳（メタデータ） (2024-05-25T08:09:36Z)
No-Regret is not enough! Bandits with General Constraints through Adaptive Regret Minimization [26.415300249303748]
本研究は, 一次アルゴリズムと双対アルゴリズムを弱適応化させることにより, 制約のサブ線形違反を回避可能であることを示す。最初のケースでは、アルゴリズムがサブ線形後悔を保証することを示し、後者の場合、厳密な競合比を$rho/(1+rho)$とする。この結果から,線形制約付き文脈帯域問題に対する新たな結果が得られる。
論文参考訳（メタデータ） (2024-05-10T16:22:33Z)
Learning Adversarial MDPs with Stochastic Hard Constraints [37.24692425018]
我々は,制約付きマルコフ決定過程(CMDP)におけるオンライン学習について,敵対的損失と厳しい制約を伴って検討した。我々の研究は、敵の損失と厳しい制約の両方にかかわるCMDPを初めて研究した。
論文参考訳（メタデータ） (2024-03-06T12:49:08Z)
Primal Dual Continual Learning: Balancing Stability and Plasticity through Adaptive Memory Allocation [86.8475564814154]
制約付き最適化問題を直接実行することは可能かつ有益であることを示す。メモリベースのメソッドでは、以前のタスクからのサンプルの小さなサブセットをリプレイバッファに格納できる。両変数は,制約摂動に対する連続学習問題の最適値の感度を示す。
論文参考訳（メタデータ） (2023-09-29T21:23:27Z)
Lower Bounds for Learning in Revealing POMDPs [88.23337313766355]
本稿では, 難易度の高い環境下での強化学習(RL)の基本的限界について検討する。 Emphmulti-steping POMDPs に対して、潜伏状態空間依存はサンプル複雑性において少なくとも$Omega(S1.5)$であることを示す。
論文参考訳（メタデータ） (2023-02-02T18:59:30Z)
A Unifying Framework for Online Optimization with Long-Term Constraints [62.35194099438855]
我々は,意思決定者が長期的制約の対象となる一連の意思決定をしなければならないオンライン学習問題について検討する。目標は、全報酬を最大化し、同時に、$T$ラウンド全体で小さな累積違反を達成することである。本稿では,この一般クラス問題に対して,未知のモデルに基づいて報酬と制約が選択された場合と,各ラウンドで敵が選択した場合の双方において,最良世界型アルゴリズムを提示する。
論文参考訳（メタデータ） (2022-09-15T16:59:19Z)
An Integer Linear Programming Framework for Mining Constraints from Data [81.60135973848125]
データから制約をマイニングするための一般的なフレームワークを提案する。特に、構造化された出力予測の推論を整数線形プログラミング(ILP)問題とみなす。提案手法は,9×9のスドクパズルの解法を学習し,基礎となるルールを提供することなく,例からツリー問題を最小限に分散させることが可能であることを示す。
論文参考訳（メタデータ） (2020-06-18T20:09:53Z)
Stochastic bandits with arm-dependent delays [102.63128271054741]
我々は、単純なUCBベースのアルゴリズムであるPatentBanditsを提案する。問題に依存しない境界も問題に依存しない境界も、性能の低い境界も提供します。
論文参考訳（メタデータ） (2020-06-18T12:13:58Z)
Exact MAP-Inference by Confining Combinatorial Search with LP Relaxation [19.660527989370646]
我々は、その最適値に対する下界を自然に定義する緩和の族を提案する。この族は常にゆるやかな緩和を含んでおり、我々はそれを見つけることができるアルゴリズムを与え、したがって、最初の非緩和NP-ハード問題を解く。緩和を考えると、元の問題を2つの非重複部分(LP-tight 部分と難しい部分)に分解する。
論文参考訳（メタデータ） (2020-04-14T09:10:47Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。