Fugu-MT 論文翻訳(概要): PFN-TS: Thompson Sampling for Contextual Bandits via Prior-Data Fitted Networks

論文の概要: PFN-TS: Thompson Sampling for Contextual Bandits via Prior-Data Fitted Networks

arxiv url: http://arxiv.org/abs/2605.10137v1
Date: Mon, 11 May 2026 07:46:10 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:50.616537
Title: PFN-TS: Thompson Sampling for Contextual Bandits via Prior-Data Fitted Networks
Title（参考訳）: PFN-TS:Thompson Smpling for Contextual Bandits via Prior-Data Fitted Networks
Authors: Yan Shuo Tan, Kenyon Ng, Ruizhe Deng, Sumetha Loganathan, Qiong Zhang, Bibhas Chakraborty,
Abstract要約: 我々は,PFN後続予測を平均逆サンプルに変換するトンプソンサンプリングアルゴリズムであるPFN-TSを提案する。 PFN-TSは、非線形合成およびOpenML分類帯域間ベンチマークで最高の平均ランクを達成している。
参考スコア（独自算出の注目度）: 7.188084723389871
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Thompson sampling is a widely used strategy for contextual bandits: at each round, it samples a reward function from a Bayesian posterior and acts greedily under that sample. Prior-data fitted networks (PFNs), such as TabPFN v2+ and TabICL v2, are attractive candidates for this purpose because they approximate Bayesian posterior predictive distributions in a single forward pass. However, PFNs predict noisy future rewards, while Thompson sampling requires uncertainty over the latent mean reward function. We propose PFN-TS, a Thompson sampling algorithm that converts PFN posterior predictives into mean-reward samples using a subsampled predictive central limit theorem. The method estimates posterior variance from a geometric grid of $O(\log n)$ dataset prefixes rather than the full $O(n)$ predictive sequence used in previous predictive-sequence approaches, and reuses TabICL's cached representations across rounds. We prove consistency of the subsampled variance estimator and give a Bayesian regret bound that decomposes PFN-TS regret into exact posterior-sampling regret under the PFN prior plus approximation terms. Empirically, PFN-TS achieves the best average rank across nonlinear synthetic and OpenML classification-to-bandit benchmarks, remains competitive on linear and BART-generated rewards, and attains the highest estimated policy value in an offline mobile-health evaluation. Code is available at https://anonymous.4open.science/r/PFN_TS-36ED/.
Abstract（参考訳）: トンプソンサンプリング(Thompson sample)は、文脈的包帯に対して広く用いられる戦略であり、各ラウンドでベイズの後部から報酬関数をサンプリングし、そのサンプルの下で優雅に作用する。 TabPFN v2+やTabICL v2のようなPFN(Presideed-data fited Network)は、ベイズ的後続予測分布を1つの前方パスで近似するため、この目的に魅力的な候補である。しかしながら、PFNはノイズのある将来の報酬を予測する一方、トンプソンサンプリングは潜伏平均報酬関数に対して不確実性を必要とする。我々は,PFN後続予測を,サブサンプル付き予測中心極限定理を用いて平均逆サンプルに変換するトンプソンサンプリングアルゴリズムであるPFN-TSを提案する。この方法は、以前の予測シーケンスアプローチで使われるフル$O(n)$予測シーケンスではなく、$O(\log n)$データセットプレフィックスの幾何学的グリッドから後方分散を推定し、ラウンド間でTabICLのキャッシュされた表現を再利用する。サブサンプリング分散推定器の整合性を証明し、PFN-TSの後悔をPFNの先行項と近似項で正確に後方サンプリング後悔に分解するベイズ的後悔境界を与える。実験的に、PFN-TSは、非線形合成とOpenML分類と帯域幅のベンチマークで最高の平均ランクを獲得し、線形およびBART生成の報酬で競争力を維持し、オフラインのモバイルヘルス評価において最も高い評価ポリシー値を得る。コードはhttps://anonymous.4open.science/r/PFN_TS-36ED/で公開されている。

論文の概要: PFN-TS: Thompson Sampling for Contextual Bandits via Prior-Data Fitted Networks

関連論文リスト