Fugu-MT 論文翻訳(概要): From External to Swap Regret 2.0: An Efficient Reduction and Oblivious Adversary for Large Action Spaces

論文の概要: From External to Swap Regret 2.0: An Efficient Reduction and Oblivious Adversary for Large Action Spaces

arxiv url: http://arxiv.org/abs/2310.19786v2
Date: Tue, 31 Oct 2023 17:57:22 GMT
ステータス: 翻訳完了
システム内更新日: 2023-11-01 18:40:20.351185
Title: From External to Swap Regret 2.0: An Efficient Reduction and Oblivious Adversary for Large Action Spaces
Title（参考訳）: 外部からswap regret 2.0: 大きなアクションスペースに対する効率的な削減と必然的な敵意
Authors: Yuval Dagan and Constantinos Daskalakis and Maxwell Fishelson and Noah Golowich
Abstract要約: 我々は、ある仮説クラスに対して外部回帰アルゴリズムが存在しない場合、同じクラスに対して非スワップ回帰アルゴリズムが存在することも示している。我々の減少は、あるゲームで非回帰学習が可能であるならば、このゲームは近似された平衡を持つ必要があることを意味する。
参考スコア（独自算出の注目度）: 29.29647282754262
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We provide a novel reduction from swap-regret minimization to external-regret minimization, which improves upon the classical reductions of Blum-Mansour [BM07] and Stolz-Lugosi [SL05] in that it does not require finiteness of the space of actions. We show that, whenever there exists a no-external-regret algorithm for some hypothesis class, there must also exist a no-swap-regret algorithm for that same class. For the problem of learning with expert advice, our result implies that it is possible to guarantee that the swap regret is bounded by {\epsilon} after $\log(N)^{O(1/\epsilon)}$ rounds and with $O(N)$ per iteration complexity, where $N$ is the number of experts, while the classical reductions of Blum-Mansour and Stolz-Lugosi require $O(N/\epsilon^2)$ rounds and at least $\Omega(N^2)$ per iteration complexity. Our result comes with an associated lower bound, which -- in contrast to that in [BM07] -- holds for oblivious and $\ell_1$-constrained adversaries and learners that can employ distributions over experts, showing that the number of rounds must be $\tilde\Omega(N/\epsilon^2)$ or exponential in $1/\epsilon$. Our reduction implies that, if no-regret learning is possible in some game, then this game must have approximate correlated equilibria, of arbitrarily good approximation. This strengthens the folklore implication of no-regret learning that approximate coarse correlated equilibria exist. Importantly, it provides a sufficient condition for the existence of correlated equilibrium which vastly extends the requirement that the action set is finite, thus answering a question left open by [DG22; Ass+23]. Moreover, it answers several outstanding questions about equilibrium computation and/or learning in games.
Abstract（参考訳）: 本稿では,blum-mansour [bm07] と stolz-lugosi [sl05] の古典的還元により,swap-regret 最小化から外部-regret 最小化への新しい還元法を提案する。ある仮説クラスに対して外部回帰アルゴリズムが存在しない場合、同じクラスに対して非スワップ回帰アルゴリズムが存在することも示している。専門家のアドバイスで学ぶ問題については,スワップの後悔は1回あたり$\log(n)^{o(1/\epsilon)$ と1回あたり$o(n)$ (n$ は専門家の数) の後に {\epsilon} で区切られることを保証できること,一方,blum-mansour と stolz-lugosi の古典的な還元には$o(n/\epsilon^2)$ と少なくとも $\omega(n^2)$ の反復複雑性が必要であることを示唆する。結果として,[bm07]のそれとは対照的に,[bm07]では,専門家よりもディストリビューションを採用可能な,限定的かつ$\ell_1$-constrainedadversariesと学習者に対して,ラウンド数を$\tilde\omega(n/\epsilon^2)$あるいは$/\epsilon$の指数値でなくてはなりません。我々の減少は、あるゲームで非回帰学習が可能であるならば、このゲームは任意によい近似の近似平衡を持つ必要があることを意味する。これは、近似的粗相関平衡が存在するという非回帰学習の民俗学的な含意を強める。重要なことに、作用集合が有限であるという要件を大きく広げた相関平衡が存在するための十分な条件を与え、 [dg22; ass+23] によって開かれた問題に答える。さらに、ゲームにおける平衡計算や学習に関するいくつかの卓越した疑問に答える。

関連論文リスト

Instance-Dependent Regret Bounds for Learning Two-Player Zero-Sum Games with Bandit Feedback [60.610120215789976]
純粋な戦略ナッシュ均衡が存在するとき、$c$ は 0 となり、最適のインスタンス依存後悔境界となることを示す。また,本アルゴリズムは最終段階の収束性も享受し,ほぼ最適サンプルを用いて純粋な戦略ナッシュ均衡を同定することができる。
論文参考訳（メタデータ） (2025-02-24T20:20:06Z)
Alternating Regret for Online Convex Optimization [43.12477663757647]
連続Hedgeアルゴリズムは,任意の逆数$次元OCO問題に対して,$tildemathcalO(dfrac23Tfrac13)$の繰り返し後悔を実現することを示す。例えば、Regret Matching 変種に対する$Omega(sqrtT)$ lower boundなどである。
論文参考訳（メタデータ） (2025-02-18T04:32:52Z)
Computational Lower Bounds for Regret Minimization in Normal-Form Games [68.66209476382213]
乗算重み更新などの既存の学習アルゴリズムが最適に近いことを示す。結果はKothari と Mehta が提案したアルゴリズムの枠組みで得られた。
論文参考訳（メタデータ） (2024-11-04T00:39:52Z)
Barriers to Welfare Maximization with No-Regret Learning [68.66209476382213]
我々は、ほぼ最適の$T$-sparse CCEの計算限界を低く証明する。特に,最大傾斜角の不適応性は,時間内に非自明な間隔を達成できないことを示す。
論文参考訳（メタデータ） (2024-11-04T00:34:56Z)
Rate-Preserving Reductions for Blackwell Approachability [72.03309261614991]
Abernethy et al. (2011) はブラックウェルのアプローチ可能性と非回帰学習が等価であることを示した。一般化された後悔最小化の例に対して、いかなるアプローチ可能性のインスタンスも厳格に削減できることが示される。
論文参考訳（メタデータ） (2024-06-10T23:23:52Z)
Near-Optimal Reinforcement Learning with Self-Play under Adaptivity Constraints [21.412085244757336]
適応性制約を伴うマルチエージェント強化学習(MARL)の問題について検討する。 2つのプレイヤーゼロサムマルコフゲームに対して、$widetildeO(sqrtH3 S2 ABK)$を後悔する(政治)排除に基づくアルゴリズムを設計する。バッチ複雑性の低い$Omega(fracHlog_AK+loglog K)$を$widetildeO(sqrtK)$で証明する。
論文参考訳（メタデータ） (2024-02-02T03:00:40Z)
Fast swap regret minimization and applications to approximate correlated equilibria [20.047624953965965]
任意の定数$varepsilon>0$に対して、我々のアルゴリズムは$varepsilon T$-swap regretを$T = Mathsfpolylog(n)$ roundsで取得する。我々のアルゴリズムは$varepsilon$に指数関数的依存を持つが、我々は一致する新しい下界を証明している。
論文参考訳（メタデータ） (2023-10-30T15:35:24Z)
Near-Optimal Learning of Extensive-Form Games with Imperfect Information [54.55092907312749]
本稿では,2プレイヤーゼロサムゲームにおいて,$widetildemathcalO((XA+YB)/varepsilon2)$プレイのエピソードのみを必要とするアルゴリズムの最初の行を,$varepsilon$-approximate Nash平衡を求める。これにより$widetildemathcalO((X2A+Y2B)/varepsilon2)$が$widetildemathcalO(maxX,
論文参考訳（メタデータ） (2022-02-03T18:18:28Z)
Near-Optimal No-Regret Learning for Correlated Equilibria in Multi-Player General-Sum Games [104.74734408204749]
マルチプレイヤーの汎用正規形式ゲームにおいて,OMWU(Optimistic Multiplicative Weights Update)を用いているエージェントが全員,O(textrmpolylog(T))$(T$)$(T$)$(OMWU)$(OMWU)$(OMWU)$(OMWU)$(OMWU)$)であることを示す。外部の後悔から内部の後悔へと結果を拡張し、後悔を交換することで、近似した平衡に収束する非結合学習ダイナミクスを確立する。
論文参考訳（メタデータ） (2021-11-11T01:19:53Z)
No Discounted-Regret Learning in Adversarial Bandits with Delays [40.670563413892154]
アルゴリズムが「割引なし」の場合、予想される遊びの割引エルゴジック分布が粗相関平衡(CCE)の集合に収束することを示した。ゼロサムゲームでは、Nash平衡のセットに収束する割引エルゴディック平均のプレイには、ディスカウントレグレットが十分ではないことを示します。
論文参考訳（メタデータ） (2021-03-08T05:15:31Z)
Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions [53.77572276969548]
我々は、C$2$UCBアルゴリズムが分割マトロイド制約に対して最適な後悔結合$tildeO(dsqrtkT + dk)$を有することを示した。一般的な制約に対して,C$2$UCBアルゴリズムで腕の報酬推定値を変更するアルゴリズムを提案する。
論文参考訳（メタデータ） (2021-01-20T04:29:18Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。