Fugu-MT 論文翻訳(概要): Super-Exponential Regret for UCT, AlphaGo and Variants

論文の概要: Super-Exponential Regret for UCT, AlphaGo and Variants

arxiv url: http://arxiv.org/abs/2405.04407v1
Date: Tue, 7 May 2024 15:35:30 GMT
ステータス: 翻訳完了
システム内更新日: 2024-05-08 13:31:20.831306
Title: Super-Exponential Regret for UCT, AlphaGo and Variants
Title（参考訳）: UCT, AlphaGo, Variantsの超指数レジストレーション
Authors: Laurent Orseau, Remi Munos,
Abstract要約: 私たちは、UDTが$D$-chain環境上で$exp(dotsexp(1)dots)$ regret($Omega(D)$ exp terms)を持つことを示します。また、この証明をAlphaGoのMCTSとその子孫に適用する。
参考スコア（独自算出の注目度）: 5.49097855598435
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We improve the proofs of the lower bounds of Coquelin and Munos (2007) that demonstrate that UCT can have $\exp(\dots\exp(1)\dots)$ regret (with $\Omega(D)$ exp terms) on the $D$-chain environment, and that a `polynomial' UCT variant has $\exp_2(\exp_2(D - O(\log D)))$ regret on the same environment -- the original proofs contain an oversight for rewards bounded in $[0, 1]$, which we fix in the present draft. We also adapt the proofs to AlphaGo's MCTS and its descendants (e.g., AlphaZero, Leela Zero) to also show $\exp_2(\exp_2(D - O(\log D)))$ regret.
Abstract（参考訳）: We improve the proofs of the lower bounds of Coquelin and Munos (2007) that demonstrate that UCT can have $\exp(\dots\exp(1)\dots)$ regret (with $\Omega(D)$ exp terms) on the $D$-chain environment and that `polynomial' UCT variant have $\exp_2(\exp_2(D - O(\log D)))$ regret on the same environment -- the original proofs contains an oversight for rewards bounded in $[0, 1]$。また、AlphaGoのMCTSとその子孫(例えば、AlphaZero、Leela Zero)にも証明を適用して、$\exp_2(\exp_2(D - O(\log D)))$ regretを示す。

関連論文リスト

High-Dimensional Calibration from Swap Regret [40.9736612423411]
任意の凸集合 $mathcalP 部分集合 mathbbRd$ 上での多次元予測のオンライン校正について検討する。我々は、$O(sqrtrho T)$$$T$ラウンド後の最悪の後悔を保証できるならば、$T = exp(logd/epsilon2) の後、$epsilon$-calibrated 予測を得ることができることを示した。
論文参考訳（メタデータ） (2025-05-27T17:31:47Z)
Tight Gap-Dependent Memory-Regret Trade-Off for Single-Pass Streaming Stochastic Multi-Armed Bandits [3.096236408131107]
Omega_alphaBig(frac(k-m+1) Tfrac1alphak1 + frac1alpha+1 sum_i:Delta_i > 0Delta_i1 - 2alphaBig)$である。これは、ストリーミングMABに対する最初の厳密なギャップ依存の後悔境界である。
論文参考訳（メタデータ） (2025-03-04T09:18:35Z)
Fast Rates for Bandit PAC Multiclass Classification [73.17969992976501]
我々は,帯域幅フィードバックを用いたマルチクラスPAC学習について検討し,入力を$K$ラベルの1つに分類し,予測されたラベルが正しいか否かに制限する。我々の主な貢献は、問題の無知な$(varepsilon,delta)$PACバージョンのための新しい学習アルゴリズムを設計することである。
論文参考訳（メタデータ） (2024-06-18T08:54:04Z)
$α$-$z$-Rényi divergences in von Neumann algebras: data-processing inequality, reversibility, and monotonicity properties in $α,z$ [2.3020018305241337]
本稿では,$alpha$-$z$-R'enyi の変分表現とデータ処理の不等式(DPI)を証明した。パラメータ $alpha,z$ における $D_alpha,z(psi|varphi)$ の単調性特性とその正規化相対エントロピーに対する極限を示す。
論文参考訳（メタデータ） (2024-04-11T10:10:08Z)
$\ell_p$-Regression in the Arbitrary Partition Model of Communication [59.89387020011663]
コーディネータモデルにおける分散$ell_p$-regression問題のランダム化通信複雑性について考察する。 p = 2$、すなわち最小二乗回帰の場合、$tildeTheta(sd2 + sd/epsilon)$ bitsの最初の最適境界を与える。 p in (1,2)$ に対して、$tildeO(sd2/epsilon + sd/mathrmpoly(epsilon)$ upper bound を得る。
論文参考訳（メタデータ） (2023-07-11T08:51:53Z)
On the Minimax Regret for Online Learning with Feedback Graphs [5.721380617450645]
強く観察可能な無向フィードバックグラフを用いて,オンライン学習を後悔する上で,上層と下層の境界を改善した。改良された上界$mathcalObigl(sqrtalpha T(ln K)/(lnalpha)bigr)$ hold for any $alpha$ and the lower bounds for bandits and experts。
論文参考訳（メタデータ） (2023-05-24T17:40:57Z)
Does Sparsity Help in Learning Misspecified Linear Bandits? [32.920577630673804]
アルゴリズムは$O(varepsilon-sds)$アクションをクエリすることで、$O(varepsilon)$-optimalアクションを得ることができることを示す。また、サンプルの複雑さに対する上限は、エラーが$O(sdeltavarepsilon)$$$0delta1$を要求する場合、ほぼ厳密であることを示す。
論文参考訳（メタデータ） (2023-03-29T19:58:39Z)
Estimation of Entropy in Constant Space with Improved Sample Complexity [14.718968517824756]
サンプルの複雑さを$(k/epsilon2)cdot textpolylog (1/epsilon)$に削減する新しい定数メモリスキームを提供する。これは$textpolylog (1/epsilon)$ factorまで最適であると推測する。
論文参考訳（メタデータ） (2022-05-19T18:51:28Z)
Active Sampling for Linear Regression Beyond the $\ell_2$ Norm [70.49273459706546]
対象ベクトルの少数のエントリのみを問合せすることを目的とした線形回帰のためのアクティブサンプリングアルゴリズムについて検討する。我々はこの$d$への依存が対数的要因まで最適であることを示す。また、損失関数に対して最初の全感度上界$O(dmax1,p/2log2 n)$を提供し、最大で$p$成長する。
論文参考訳（メタデータ） (2021-11-09T00:20:01Z)
Learning low-degree functions from a logarithmic number of random queries [77.34726150561087]
任意の整数 $ninmathbbN$, $din1,ldots,n$ および任意の $varepsilon,deltain(0,1)$ に対して、有界関数 $f:-1,1nto[-1,1]$ に対して、少なくとも$d$ の次数を学ぶことができる。
論文参考訳（メタデータ） (2021-09-21T13:19:04Z)
Matrix hypercontractivity, streaming algorithms and LDCs: the large alphabet case [3.729242965449096]
We consider streaming algorithm for Approximating the value of Unique Games on a hypergraph with $t$-size hyperedges。この値の$(r-varepsilon)$-approximationを達成する逆モデルの全てのストリーミングアルゴリズムは、$Omega(n1-2/t)$量子空間を必要とする。
論文参考訳（メタデータ） (2021-09-06T16:56:19Z)
Nearly Horizon-Free Offline Reinforcement Learning [97.36751930393245]
S$状態、$A$アクション、計画的地平$H$で、エピソードな時間同質なMarkov決定プロセスに関するオフライン強化学習を再考する。経験的MDPを用いた評価と計画のための,約$H$自由なサンプル複雑性境界の最初の集合を得る。
論文参考訳（メタデータ） (2021-03-25T18:52:17Z)
Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
我々は,バンディットフィードバックを用いてオンライン学習を学習する。 learnerは、コスト/リワード関数が"pseudo-1d"構造を許可するゼロ次オラクルのみにアクセスできる。我々は、$T$がラウンドの数である任意のアルゴリズムの後悔のために$min(sqrtdT、T3/4)$の下限を示しています。ランダム化オンライングラデーション下降とカーネル化指数重み法を組み合わせた新しいアルゴリズムsbcalgを提案し,疑似-1d構造を効果的に活用する。
論文参考訳（メタデータ） (2021-02-15T08:16:51Z)
An Optimal Separation of Randomized and Quantum Query Complexity [67.19751155411075]
すべての決定木に対して、与えられた順序 $ellsqrtbinomdell (1+log n)ell-1,$ sum to at least $cellsqrtbinomdell (1+log n)ell-1,$ where $n$ is the number of variables, $d$ is the tree depth, $c>0$ is a absolute constant。
論文参考訳（メタデータ） (2020-08-24T06:50:57Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。