Fugu-MT 論文翻訳(概要): Robust Parameter Learning for Uncertain MDPs

論文の概要: Robust Parameter Learning for Uncertain MDPs

arxiv url: http://arxiv.org/abs/2605.01339v1
Date: Sat, 02 May 2026 09:22:05 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-05 20:33:49.715123
Title: Robust Parameter Learning for Uncertain MDPs
Title（参考訳）: 不確かさMDPに対するロバストパラメータ学習
Authors: Yannik Schnitzer, Alessandro Abate, David Parker,
Abstract要約: 未知のマルコフ決定過程(MDP)を検証するための学習ベースのアプローチは、しばしば不確実なMDPを用いる。本稿では,パラメータの集合上で遷移確率が表現されるパラメトリックMDP(pMDP)を用いて,そのようなモデルを学習することを提案する。我々は、経験的遷移周波数からの統計的不確実性をpMDPのパラメータ空間に予測し、基礎となるMDPに対してほぼ正しい(PAC)不確実性モデルを生成する。
参考スコア（独自算出の注目度）: 55.60489406616378
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Learning-based approaches to verifying unknown Markov decision processes (MDPs) often employ uncertain MDPs. These models use, for example, confidence intervals to capture transition uncertainty and allow synthesis of policies that are robust to this uncertainty. However, this approach typically quantifies uncertainty independently for individual transition probabilities, ignoring dependencies due to shared latent quantities. We propose to learn such models using parametric MDPs (pMDPs), where transition probabilities are expressions over a set of parameters. We project statistical uncertainty from empirical transition frequencies onto the pMDP's parameter space, yielding a probably approximately correct (PAC) uncertainty model for the underlying MDP that respects the algebraic dependencies between transitions. The resulting models are algorithmically challenging to solve, so we propose a hierarchy of sound polytopic outer approximations of the induced confidence set. We implement and evaluate our approach, demonstrating substantially tighter uncertainty estimates than classical interval-based uncertain MDP learning techniques.
Abstract（参考訳）: 未知のマルコフ決定過程(MDP)を検証するための学習ベースのアプローチは、しばしば不確実なMDPを用いる。これらのモデルは、例えば、信頼区間を使って遷移の不確実性を捕捉し、この不確実性に対して堅牢なポリシーの合成を可能にする。しかし、このアプローチは通常、個々の遷移確率に対して独立に不確実性を定量化し、共有潜在量による依存を無視している。本稿では,パラメータの集合上で遷移確率が表現されるパラメトリックMDP(pMDP)を用いて,そのようなモデルを学習することを提案する。我々は、経験的遷移周波数からの統計的不確かさをpMDPのパラメータ空間に予測し、遷移間の代数的依存関係を尊重する基礎となるMDPに対して、ほぼ正しい(PAC)不確かさモデルを生成する。得られたモデルはアルゴリズム的に解くのが難しいため、誘導された信頼集合の音響多面的外的近似の階層構造を提案する。提案手法を実装,評価し,古典的区間に基づく不確実性学習手法よりもはるかに厳密な不確実性推定を実証した。

論文の概要: Robust Parameter Learning for Uncertain MDPs

関連論文リスト