Fugu-MT 論文翻訳(概要): PAC-Bayesian Reward-Certified Outcome Weighted Learning

論文の概要: PAC-Bayesian Reward-Certified Outcome Weighted Learning

arxiv url: http://arxiv.org/abs/2604.01946v1
Date: Thu, 02 Apr 2026 12:08:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-03 14:21:10.767669
Title: PAC-Bayesian Reward-Certified Outcome Weighted Learning
Title（参考訳）: PAC-Bayesian Reward-Certified Outcome Weighted Learning
Authors: Yuya Ishikawa, Shu Tamano,
Abstract要約: 結果重み付け学習(OWL)による最適個別化処理規則(ITR)の推定は、しばしば真に潜伏したユーティリティに対してうるさいあるいは楽観的なプロキシである観察された報酬に依存する。 PAC-Bayesian Reward-Certified Outcome Weighted Learning (PROWL)を提案する。一方的な不確実性証明が与えられた場合、PROWLは真の期待値に基づいて、保守的な報酬と厳密なポリシーに依存した下限を構築する。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Estimating optimal individualized treatment rules (ITRs) via outcome weighted learning (OWL) often relies on observed rewards that are noisy or optimistic proxies for the true latent utility. Ignoring this reward uncertainty leads to the selection of policies with inflated apparent performance, yet existing OWL frameworks lack the finite-sample guarantees required to systematically embed such uncertainty into the learning objective. To address this issue, we propose PAC-Bayesian Reward-Certified Outcome Weighted Learning (PROWL). Given a one-sided uncertainty certificate, PROWL constructs a conservative reward and a strictly policy-dependent lower bound on the true expected value. Theoretically, we prove an exact certified reduction that transforms robust policy learning into a unified, split-free cost-sensitive classification task. This formulation enables the derivation of a nonasymptotic PAC-Bayes lower bound for randomized ITRs, where we establish that the optimal posterior maximizing this bound is exactly characterized by a general Bayes update. To overcome the learning-rate selection problem inherent in generalized Bayesian inference, we introduce a fully automated, bounds-based calibration procedure, coupled with a Fisher-consistent certified hinge surrogate for efficient optimization. Our experiments demonstrate that PROWL achieves improvements in estimating robust, high-value treatment regimes under severe reward uncertainty compared to standard methods for ITR estimation.
Abstract（参考訳）: 結果重み付け学習(OWL)による最適個別化処理規則(ITR)の推定は、しばしば真に潜伏したユーティリティに対してうるさいあるいは楽観的なプロキシである観察された報酬に依存する。この報酬の不確実性を無視すると、明らかなパフォーマンスが膨らんだポリシーの選択につながるが、既存のOWLフレームワークには、学習目的にそのような不確実性を体系的に組み込むために必要な有限サンプル保証が欠けている。そこで本研究では,PAC-Bayesian Reward-Certified Outcome Weighted Learning (PROWL)を提案する。一方的な不確実性証明が与えられた場合、PROWLは真の期待値に基づいて、保守的な報酬と厳密なポリシーに依存した下限を構築する。理論的には、ロバストな政策学習を統一的でスプリットフリーなコストセンシティブな分類タスクに変換するための、正確な精度の低下を証明します。この定式化により、ランダム化ITRに対する漸近的PAC-Bayes下界の導出が可能となり、この境界を最大化する最適の後方は、一般ベイズ更新によって正確に特徴づけられる。一般ベイズ推論に固有の学習速度選択問題を克服するため,完全自動化された境界式キャリブレーション手法を導入し,フィッシャー整合型ヒンジサロゲートと組み合わせて効率的な最適化を行う。本実験は,ITR推定の標準手法と比較して,高い報酬不確実性の下で頑健で高価値な治療体制を推定する上で,POWLが向上することを示した。

論文の概要: PAC-Bayesian Reward-Certified Outcome Weighted Learning

関連論文リスト