Fugu-MT 論文翻訳(概要): Asymptotically Optimal Thompson Sampling Based Policy for the Uniform Bandits and the Gaussian Bandits

論文の概要: Asymptotically Optimal Thompson Sampling Based Policy for the Uniform Bandits and the Gaussian Bandits

arxiv url: http://arxiv.org/abs/2302.14407v1
Date: Tue, 28 Feb 2023 08:42:42 GMT
ステータス: 翻訳完了
システム内更新日: 2023-03-01 17:30:50.228648
Title: Asymptotically Optimal Thompson Sampling Based Policy for the Uniform Bandits and the Gaussian Bandits
Title（参考訳）: 一様帯域とガウス帯域に対する漸近的最適トンプソンサンプリング法
Authors: Jongyeong Lee, Chao-Kai Chiang, Masashi Sugiyama
Abstract要約: 非形式的な先入観の切り替えが、予想された後悔に大きく影響していることを示す。我々はTS-T(TS with Truncation)と呼ばれるTSベースのポリシーを提案する。
参考スコア（独自算出の注目度）: 79.90616674042151
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Thompson sampling (TS) for the parametric stochastic multi-armed bandits has been well studied under the one-dimensional parametric models. It is often reported that TS is fairly insensitive to the choice of the prior when it comes to regret bounds. However, this property is not necessarily true when multiparameter models are considered, e.g., a Gaussian model with unknown mean and variance parameters. In this paper, we first extend the regret analysis of TS to the model of uniform distributions with unknown supports. Specifically, we show that a switch of noninformative priors drastically affects the regret in expectation. Through our analysis, the uniform prior is proven to be the optimal choice in terms of the expected regret, while the reference prior and the Jeffreys prior are found to be suboptimal, which is consistent with previous findings in the model of Gaussian distributions. However, the uniform prior is specific to the parameterization of the distributions, meaning that if an agent considers different parameterizations of the same model, the agent with the uniform prior might not always achieve the optimal performance. In light of this limitation, we propose a slightly modified TS-based policy, called TS with Truncation (TS-T), which can achieve the asymptotic optimality for the Gaussian distributions and the uniform distributions by using the reference prior and the Jeffreys prior that are invariant under one-to-one reparameterizations. The pre-processig of the posterior distribution is the key to TS-T, where we add an adaptive truncation procedure on the parameter space of the posterior distributions. Simulation results support our analysis, where TS-T shows the best performance in a finite-time horizon compared to other known optimal policies, while TS with the invariant priors performs poorly.
Abstract（参考訳）: パラメトリック確率的マルチアームバンディットのためのトンプソンサンプリング(TS)は、一次元パラメトリックモデルの下でよく研究されている。 TSは、後悔の境界に関して、前者の選択にかなり敏感であるとしばしば報告されている。しかし、この性質は、例えば、未知の平均と分散パラメータを持つガウスモデルなど、多パラメータモデルを考える場合に必ずしも当てはまらない。本稿ではまず, ts の後悔解析を, 未知の支持を持つ一様分布のモデルに拡張する。具体的には,非インフォーマティブプライオリティの切り替えが,期待の後悔に大きく影響することを示す。我々の分析により、一様事前は、期待される後悔の観点で最適選択であることが証明され、一方、参照先行とジェフリー先行は、ガウス分布のモデルにおける以前の発見と一致する準最適であることが判明した。しかし、一様事前は分布のパラメータ化に特有であり、もしエージェントが同じモデルの異なるパラメータ化を考慮すれば、一様事前を持つエージェントが必ずしも最適性能を達成するとは限らない。この制限を考慮に入れ、我々はTS-T(TS with Truncation)と呼ばれる少し修正されたTSベースのポリシーを提案し、これはガウス分布と一様分布の漸近最適性を1対1のパラメータ化の下で不変な基準前とジェフリーズ前の基準を用いて達成することができる。後方分布の前処理はts-tの鍵であり,後方分布のパラメータ空間に適応的切断法を加える。シミュレーションの結果,ts-tは他の既知の最適方針と比較して有限時間水平線で最高の性能を示し,tsは不変前もって性能が低かった。

関連論文リスト

Learning Parametric Distributions from Samples and Preferences [19.879505582147807]
選好に基づくM推定器は、サンプルのみのM推定器よりも分散性が高いことを示す。我々は,$mathcalO (1/n)$ -- $Theta (1/sqrtn)$よりも大幅に改善された$mathcalO (1/n)$ -- の推定誤差スケーリングを実現する推定器を提案する。
論文参考訳（メタデータ） (2025-05-29T15:33:43Z)
Calibrated Multi-Preference Optimization for Aligning Diffusion Models [92.90660301195396]
Calibrated Preference Optimization (CaPO) は、テキスト・ツー・イメージ(T2I)拡散モデルを調整する新しい手法である。 CaPOは、人間の注釈のない複数の報酬モデルからの一般的な好みを取り入れている。実験結果から, CaPOは従来法よりも常に優れていたことが示唆された。
論文参考訳（メタデータ） (2025-02-04T18:59:23Z)
Continuous Bayesian Model Selection for Multivariate Causal Discovery [22.945274948173182]
現在の因果的発見アプローチは、構造的識別可能性を確保するために、限定的なモデル仮定や介入データへのアクセスを必要とする。近年の研究では、ベイズモデルの選択はより柔軟な仮定のために制限的モデリングを交換することで精度を大幅に向上させることができることが示されている。合成データセットと実世界のデータセットの両方において、我々のアプローチの競争力を実証する。
論文参考訳（メタデータ） (2024-11-15T12:55:05Z)
Rényi Neural Processes [14.11793373584558]
本稿では,事前の誤特定の影響を改善するためにR'enyi Neural Processs (RNP)を提案する。密度比 $fracpq$ は (1-$alpha$) の差分勾配で後方に関してスケールする。実験の結果,最先端のNPファミリーモデルよりも一貫したログライクな改善が見られた。
論文参考訳（メタデータ） (2024-05-25T00:14:55Z)
Should We Learn Most Likely Functions or Parameters? [51.133793272222874]
モデルとデータによって示唆される最も可能性の高い関数を直接推定する利点と欠点について検討する。関数空間MAP推定は, より平坦な最小化, 一般化, オーバーフィッティングの改善につながる可能性がある。
論文参考訳（メタデータ） (2023-11-27T16:39:55Z)
Prediction-Oriented Bayesian Active Learning [51.426960808684655]
予測情報ゲイン(EPIG)は、パラメータではなく予測空間における情報ゲインを測定する。 EPIGは、さまざまなデータセットやモデルにわたるBALDと比較して、予測パフォーマンスが向上する。
論文参考訳（メタデータ） (2023-04-17T10:59:57Z)
Optimality of Thompson Sampling with Noninformative Priors for Pareto Bandits [81.45853204922795]
トンプソンサンプリングは、いくつかの報酬モデルにおいて問題依存の低い境界を達成することが示されている。重い尾を持つパレートモデルに対するTSの最適性は、2つの未知のパラメータによってパラメータ化される。ジェフリーズおよび参照先行値を持つTSは、トラルニケート手順を使用すると、下界を達成できる。
論文参考訳（メタデータ） (2023-02-03T04:47:14Z)
On the Effectiveness of Parameter-Efficient Fine-Tuning [79.6302606855302]
現在、多くの研究が、パラメータのごく一部のみを微調整し、異なるタスク間で共有されるパラメータのほとんどを保持することを提案している。これらの手法は, いずれも細粒度モデルであり, 新たな理論的解析を行う。我々の理論に根ざした空間性の有効性にもかかわらず、調整可能なパラメータをどう選ぶかという問題はまだ未解決のままである。
論文参考訳（メタデータ） (2022-11-28T17:41:48Z)
Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits [17.11922027966447]
この研究は、高次元およびスパースな文脈的包帯におけるトンプソンサンプリングの理論的な保証を提供する。より高速な計算のために、MCMCの代わりに未知のパラメータと変分推論をモデル化するために、スパイク・アンド・スラブを用いる。
論文参考訳（メタデータ） (2022-11-11T02:23:39Z)
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time [69.7693300927423]
複数モデルの重み付けを異なるパラメータ構成で微調整することにより,精度とロバスト性が向上することを示す。モデルスープ手法は,複数の画像分類や自然言語処理タスクにまで拡張されている。
論文参考訳（メタデータ） (2022-03-10T17:03:49Z)
AdaTerm: Adaptive T-Distribution Estimated Robust Moments for Noise-Robust Stochastic Gradient Optimization [14.531550983885772]
本稿では,学生のt分布を取り入れた新しいアプローチであるAdaTermを提案する。これは最適化プロセスの統一的な処理を提供し、初めてt分布の統計モデルの下で包括的なフレームワークを提供する。
論文参考訳（メタデータ） (2022-01-18T03:13:19Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。