Fugu-MT 論文翻訳(概要): Generalized Regret Analysis of Thompson Sampling using Fractional Posteriors

論文の概要: Generalized Regret Analysis of Thompson Sampling using Fractional Posteriors

arxiv url: http://arxiv.org/abs/2309.06349v1
Date: Tue, 12 Sep 2023 16:15:33 GMT
ステータス: 翻訳完了
システム内更新日: 2023-09-13 12:12:48.903430
Title: Generalized Regret Analysis of Thompson Sampling using Fractional Posteriors
Title（参考訳）: 分節後部を用いたトンプソンサンプリングの一般化レグレト解析
Authors: Prateek Jaiswal, Debdeep Pati, Anirban Bhattacharya, Bani K. Mallick
Abstract要約: トンプソンサンプリング(Thompson sample, TS)は、マルチアームバンディット問題を解くアルゴリズムの1つである。 TSの変種である$alpha$-TSを考え、標準的な後続分布の代わりに$alpha$-posteriorまたは$alpha$-posteriorを使用する。
参考スコア（独自算出の注目度）: 12.43000662545423
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Thompson sampling (TS) is one of the most popular and earliest algorithms to solve stochastic multi-armed bandit problems. We consider a variant of TS, named $\alpha$-TS, where we use a fractional or $\alpha$-posterior ($\alpha\in(0,1)$) instead of the standard posterior distribution. To compute an $\alpha$-posterior, the likelihood in the definition of the standard posterior is tempered with a factor $\alpha$. For $\alpha$-TS we obtain both instance-dependent $\mathcal{O}\left(\sum_{k \neq i^*} \Delta_k\left(\frac{\log(T)}{C(\alpha)\Delta_k^2} + \frac{1}{2} \right)\right)$ and instance-independent $\mathcal{O}(\sqrt{KT\log K})$ frequentist regret bounds under very mild conditions on the prior and reward distributions, where $\Delta_k$ is the gap between the true mean rewards of the $k^{th}$ and the best arms, and $C(\alpha)$ is a known constant. Both the sub-Gaussian and exponential family models satisfy our general conditions on the reward distribution. Our conditions on the prior distribution just require its density to be positive, continuous, and bounded. We also establish another instance-dependent regret upper bound that matches (up to constants) to that of improved UCB [Auer and Ortner, 2010]. Our regret analysis carefully combines recent theoretical developments in the non-asymptotic concentration analysis and Bernstein-von Mises type results for the $\alpha$-posterior distribution. Moreover, our analysis does not require additional structural properties such as closed-form posteriors or conjugate priors.
Abstract（参考訳）: トンプソンサンプリング(ts)は、確率的多腕バンディット問題を解決する最もポピュラーで初期のアルゴリズムの一つである。我々は、標準の後方分布の代わりに、分数または$\alpha$-posterior (\alpha\in(0,1)$) を使用する、$\alpha$-ts という ts の変種を考える。 alpha$-posterior を計算するために、標準後方の定義の確率は$\alpha$ という係数でテンダリングされる。インスタンスに依存しない$\mathcal{o}\left(\sum_{k \neq i^*} \delta_k\left(\frac{\log(t)}{c(\alpha)\delta_k^2} + \frac{1}{2} \right)\right)$ およびインスタンスに依存しない$\mathcal{o}(\sqrt{kt\log k})$ プリミティブ分布における非常に穏やかな条件下での頻発的後悔境界、ただし$\delta_k$は$k^{th}$と最高のアームの間の真の平均報酬の差であり、$c(\alpha)$は既知の定数である。ガウス級および指数族モデルの両方が、報酬分布の一般条件を満たす。事前分布の条件は、その密度が正、連続、有界である必要があるだけである。また、改善された UCB [Auer and Ortner, 2010] の値と(定数まで)一致する別のインスタンス依存後悔の上界も確立する。我々の後悔分析は、非漸近濃度分析における最近の理論的発展と、$\alpha$-posterior distributionに対するBernstein-von Mises型の結果とを慎重に組み合わせている。さらに,本解析では,クローズドフォーム後部や共役前駆体などの構造的性質は必要としない。

関連論文リスト

Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes [43.68309224776421]
過度にパラメータ化されたモデルのトレーニングにおいて、トレーニングとテストエラーのギャップを分析する。トレーニング時間にもミキシングにも依存せず、次元や勾配規範にも依存せず、損失やモデルの他の特性にも依存しています。
論文参考訳（メタデータ） (2025-05-25T10:49:09Z)
Distribution of lowest eigenvalue in $k$-body bosonic random matrix ensembles [0.8999666725996978]
有限多ボソン系の最小固有値分布を$k$-body相互作用で数値的に検討する。最も低い固有値の分布の最初の4つのモーメントは、$q$パラメータの関数として分析されている。
論文参考訳（メタデータ） (2024-04-30T20:44:31Z)
Minimax Optimality of Score-based Diffusion Models: Beyond the Density Lower Bound Assumptions [11.222970035173372]
カーネルベースのスコア推定器は$widetildeOleft(n-1 t-fracd+22(tfracd2 vee 1)rightの最適平均二乗誤差を達成する核を用いたスコア推定器は,拡散モデルで生成した試料の分布の総変動誤差に対して,極小ガウスの下での最大平均2乗誤差を$widetildeOleft(n-1/2 t-fracd4right)$上界で達成することを示す。
論文参考訳（メタデータ） (2024-02-23T20:51:31Z)
Fast UCB-type algorithms for stochastic bandits with heavy and super heavy symmetric noise [45.60098988395789]
マルチアームバンディットのためのUCB型アルゴリズムを構築するための新しいアルゴリズムを提案する。報酬の対称雑音の場合、$O(log TsqrtKTlog T)$ regret bound を $Oleft の代わりに達成できることを示す。
論文参考訳（メタデータ） (2024-02-10T22:38:21Z)
Better and Simpler Lower Bounds for Differentially Private Statistical Estimation [7.693388437377614]
任意の$alpha le O(1)$に対して、ガウスの共分散をスペクトル誤差まで推定するには$tildeOmegaleft(fracd3/2alpha varepsilon + fracdalpha2right)$サンプルが必要である。次に、有界な$k$thモーメントで重み付き分布の平均を推定するには$tildeOmegaleft(fracdalphak/(k-1) varepsilon +
論文参考訳（メタデータ） (2023-10-10T04:02:43Z)
A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing [68.80803866919123]
非線形測定では、ほとんどの先行結果は一様ではない、すなわち、すべての$mathbfx*$に対してではなく、固定された$mathbfx*$に対して高い確率で保持される。本フレームワークはGCSに1ビット/一様量子化観測と単一インデックスモデルを標準例として適用する。また、指標集合が計量エントロピーが低い製品プロセスに対して、より厳密な境界を生み出す濃度不等式も開発する。
論文参考訳（メタデータ） (2023-09-25T17:54:19Z)
Asymptotically Optimal Pure Exploration for Infinite-Armed Bandits [4.811176167998627]
我々は、未知の分布から生じる無限に多くのバンドイットアームを用いて純粋探索を研究する。私たちのゴールは、平均的な報酬が1-delta$の1つの高品質なアームを、最高の$eta$-fraction of armsの1つとして$varepsilon$内で効率的に選択することにあります。
論文参考訳（メタデータ） (2023-06-03T04:00:47Z)
Convergence of a Normal Map-based Prox-SGD Method under the KL Inequality [0.0]
我々は、$symbol$k$収束問題に対して、新しいマップベースのアルゴリズム(mathsfnorMtext-mathsfSGD$)を提案する。
論文参考訳（メタデータ） (2023-05-10T01:12:11Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
モノトン活性化に対する $mathbfxmapstosigma(mathbfwcdotmathbfx)$ の関数について検討する。学習者の目標は仮説ベクトル $mathbfw$ that $F(mathbbw)=C, epsilon$ を高い確率で出力することである。
論文参考訳（メタデータ） (2022-06-17T17:55:43Z)
Optimal Mean Estimation without a Variance [103.26777953032537]
本研究では,データ生成分布の分散が存在しない環境での重み付き平均推定問題について検討する。最小の信頼区間を$n,d,delta$の関数として得る推定器を設計する。
論文参考訳（メタデータ） (2020-11-24T22:39:21Z)
Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
学習者が生成モデル$Y = langle X,w* rangle + epsilon$から$n$のサンプルにアクセスできるような高次元頑健な線形回帰問題について検討する。 i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance, (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
論文参考訳（メタデータ） (2020-07-16T06:44:44Z)
Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
期待される正方形損失から、最も適合した単一ニューロンを学習することの問題点を考察する。 ReLUアクティベーションでは、我々の人口リスク保証は$O(mathsfOPT1/2)+epsilon$である。 ReLUアクティベーションでは、我々の人口リスク保証は$O(mathsfOPT1/2)+epsilon$である。
論文参考訳（メタデータ） (2020-05-29T07:20:35Z)
Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness [151.67113334248464]
我々は、他の攻撃モデルに対してスムースな手法を拡張することは困難であることを示す。我々はCIFARに関する実験結果を示し,その理論を検証した。
論文参考訳（メタデータ） (2020-02-08T22:02:14Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。