Fugu-MT 論文翻訳(概要): Identification of Unexpected Decisions in Partially Observable Monte-Carlo Planning: a Rule-Based Approach

論文の概要: Identification of Unexpected Decisions in Partially Observable Monte-Carlo Planning: a Rule-Based Approach

arxiv url: http://arxiv.org/abs/2012.12732v1
Date: Wed, 23 Dec 2020 15:09:28 GMT
ステータス: 翻訳完了
システム内更新日: 2021-04-25 23:34:25.712515
Title: Identification of Unexpected Decisions in Partially Observable Monte-Carlo Planning: a Rule-Based Approach
Title（参考訳）: 部分観測可能なモンテカルロ計画における予測外決定の同定-ルールに基づくアプローチ
Authors: Giulio Mazzi, Alberto Castellini, Alessandro Farinelli
Abstract要約: 本稿では,POMCPポリシーをトレースを検査して分析する手法を提案する。提案手法は, 政策行動の局所的特性を探索し, 予期せぬ決定を識別する。我々は,POMDPの標準ベンチマークであるTigerに対するアプローチと,移動ロボットナビゲーションに関する現実の問題を評価した。
参考スコア（独自算出の注目度）: 78.05638156687343
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Partially Observable Monte-Carlo Planning (POMCP) is a powerful online algorithm able to generate approximate policies for large Partially Observable Markov Decision Processes. The online nature of this method supports scalability by avoiding complete policy representation. The lack of an explicit representation however hinders interpretability. In this work, we propose a methodology based on Satisfiability Modulo Theory (SMT) for analyzing POMCP policies by inspecting their traces, namely sequences of belief-action-observation triplets generated by the algorithm. The proposed method explores local properties of policy behavior to identify unexpected decisions. We propose an iterative process of trace analysis consisting of three main steps, i) the definition of a question by means of a parametric logical formula describing (probabilistic) relationships between beliefs and actions, ii) the generation of an answer by computing the parameters of the logical formula that maximize the number of satisfied clauses (solving a MAX-SMT problem), iii) the analysis of the generated logical formula and the related decision boundaries for identifying unexpected decisions made by POMCP with respect to the original question. We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation. Results show that the approach can exploit human knowledge on the domain, outperforming state-of-the-art anomaly detection methods in identifying unexpected decisions. An improvement of the Area Under Curve up to 47\% has been achieved in our tests.
Abstract（参考訳）: 部分的に観測可能なモンテカルロ計画 (POMCP) は、大規模な部分観測可能なマルコフ決定プロセスのための近似ポリシーを生成することができる強力なオンラインアルゴリズムである。この手法のオンライン性は、完全なポリシー表現を避けてスケーラビリティをサポートする。しかし、明示的な表現の欠如は解釈可能性を妨げる。本研究では,pomcpポリシーを,そのトレース,すなわちアルゴリズムが生成する信念・行動・観察三重項列を検査することにより解析する,満足性モジュラー理論(smt)に基づく手法を提案する。提案手法は,政策行動の局所的特性を探索し,予期せぬ決定を識別する。 We propose an iterative process of trace analysis consisting of three main steps, i) the definition of a question by means of a parametric logical formula describing (probabilistic) relationships between beliefs and actions, ii) the generation of an answer by computing the parameters of the logical formula that maximize the number of satisfied clauses (solving a MAX-SMT problem), iii) the analysis of the generated logical formula and the related decision boundaries for identifying unexpected decisions made by POMCP with respect to the original question. 我々は,POMDPの標準ベンチマークであるTigerに対するアプローチと,移動ロボットナビゲーションに関する現実の問題を評価する。結果は、この手法がドメイン上の人間の知識を活用でき、予期せぬ決定を識別するための最先端の異常検出方法よりも優れていることを示している。テストでは,最大47.5%のエリアアンダーカーブの改善が達成された。

論文の概要: Identification of Unexpected Decisions in Partially Observable Monte-Carlo Planning: a Rule-Based Approach

関連論文リスト