Fugu-MT 論文翻訳(概要): Multi-Environment POMDPs: Discrete Model Uncertainty Under Partial Observability

論文の概要: Multi-Environment POMDPs: Discrete Model Uncertainty Under Partial Observability

arxiv url: http://arxiv.org/abs/2510.23744v1
Date: Mon, 27 Oct 2025 18:24:11 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-29 15:35:36.411894
Title: Multi-Environment POMDPs: Discrete Model Uncertainty Under Partial Observability
Title（参考訳）: 多環境PMDP:部分観測可能性下での離散モデル不確かさ
Authors: Eline M. Bovy, Caleb Probine, Marnix Suilen, Ufuk Topcu, Nils Jansen,
Abstract要約: 多環境POMDP(ME-POMDP)は、標準POMDPを離散モデル不確実性で拡張する。本稿では, ME-POMDP を初期信念の集合を用いて POMDP に一般化可能であることを示す。次に、AB-POMDPのロバストなポリシーを計算するために、正確で近似的な(ポイントベース)アルゴリズムを考案する。
参考スコア（独自算出の注目度）: 29.63953552645502
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-environment POMDPs (ME-POMDPs) extend standard POMDPs with discrete model uncertainty. ME-POMDPs represent a finite set of POMDPs that share the same state, action, and observation spaces, but may arbitrarily vary in their transition, observation, and reward models. Such models arise, for instance, when multiple domain experts disagree on how to model a problem. The goal is to find a single policy that is robust against any choice of POMDP within the set, i.e., a policy that maximizes the worst-case reward across all POMDPs. We generalize and expand on existing work in the following way. First, we show that ME-POMDPs can be generalized to POMDPs with sets of initial beliefs, which we call adversarial-belief POMDPs (AB-POMDPs). Second, we show that any arbitrary ME-POMDP can be reduced to a ME-POMDP that only varies in its transition and reward functions or only in its observation and reward functions, while preserving (optimal) policies. We then devise exact and approximate (point-based) algorithms to compute robust policies for AB-POMDPs, and thus ME-POMDPs. We demonstrate that we can compute policies for standard POMDP benchmarks extended to the multi-environment setting.
Abstract（参考訳）: 多環境POMDP(ME-POMDP)は、標準POMDPを離散モデル不確実性で拡張する。 ME-POMDPは、同じ状態、行動、観測空間を共有するPOMDPの有限集合を表すが、遷移、観測、報酬モデルにおいて任意に異なる可能性がある。例えば、複数のドメインの専門家が問題をモデル化する方法に異を唱えたときに、そのようなモデルが発生する。目標は、セット内の任意のPOMDPの選択に対して堅牢な単一のポリシーを見つけることであり、すなわち、すべてのPOMDPの中で最悪の報酬を最大化するポリシーである。既存の作業を次のように一般化し、拡張する。まず,ME-POMDP を初期信念のセットで POMDP に一般化できることを示し,それを AB-POMDP (Adversarial-Belief POMDPs) と呼ぶ。第2に,任意の ME-POMDP を ME-POMDP に還元できることを示す。次に,AB-POMDPのロバストポリシ,すなわちME-POMDPの厳密かつ近似的なアルゴリズムを考案する。マルチ環境設定まで拡張された標準PMDPベンチマークのポリシーを計算できることを実証する。

論文の概要: Multi-Environment POMDPs: Discrete Model Uncertainty Under Partial Observability

関連論文リスト