Fugu-MT 論文翻訳(概要): Bayesian learning for the stochastic shortest path problem

論文の概要: Bayesian learning for the stochastic shortest path problem

arxiv url: http://arxiv.org/abs/2606.04845v1
Date: Wed, 03 Jun 2026 13:13:41 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-04 20:44:18.773558
Title: Bayesian learning for the stochastic shortest path problem
Title（参考訳）: 確率的最短経路問題に対するベイズ学習
Authors: Chon Wai Ho, Sumeetpal S. Singh, Jiaqi Guo,
Abstract要約: 我々は最適な意思決定戦略を学ぶためのベイズ的枠組みを開発する。非現実的なモデリング仮定やアドホック近似には依存していません。私たちは、我々のフレームワークが不確実性を忠実に定量化していることを示します。
参考スコア（独自算出の注目度）: 7.552707920682579
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sequential decision-making problems are often modelled as a Markov decision process (MDP). We focus on the stochastic shortest path (SSP) problem, which is an infinite-horizon undiscounted MDP with absorbing terminal states. We develop a Bayesian framework to learn the optimal decision strategy through interactions with the decision-making task. Specifically, we learn the optimal action-value function $Q^*$, but unlike many existing Bayesian approaches, we do not rely on unrealistic modelling assumptions and ad-hoc approximations. Our approach is to directly construct the posterior beliefs for $Q^*$ through Bellman's optimality equations. For deterministic rewards, we characterise the posterior as a distribution with a manifold density. To facilitate simpler inference, we relax the likelihood so that a Lebesgue density exists. The flip side is to create unidentifiability issues. Specifically, the relaxed posterior can have significant mass on improper decision rules, while the exact posterior will not. We also calculate the exact posterior probabilities for optimal action selections for the tabular parametrisation of $Q^*$, a Gaussian likelihood relaxation and a Gaussian prior, which is useful in benchmarking studies. Numerical studies on variants of the Deep Sea benchmark verify our findings. We demonstrate that our framework faithfully quantifies uncertainty and, compared to other temporal-difference-based Bayesian methodologies, is more data efficient. We conclude with recommendations for future work.
Abstract（参考訳）: 連続的な意思決定問題は、しばしばマルコフ決定プロセス(MDP)としてモデル化される。終端状態を吸収する無限水平非カウント型MDPである確率的最短経路(SSP)問題に焦点をあてる。我々は,意思決定タスクとのインタラクションを通じて最適な意思決定戦略を学ぶためのベイズ的枠組みを開発する。具体的には、最適な作用値関数 $Q^*$ を学習するが、既存のベイズ的アプローチとは異なり、非現実的なモデリング仮定やアドホック近似には依存しない。我々のアプローチはベルマンの最適性方程式を通して、$Q^*$に対する後続の信念を直接構築することである。決定論的報酬については、後部を多様体密度の分布として特徴づける。より単純な推論を容易にするために、ルベーグ密度が存在する確率を緩和する。逆の側面は、識別不能な問題を生み出すことです。具体的には、緩やかな後部は不適切な決定規則にかなりの質量を持つが、正確な後部はそうではない。また,検定実験において有用である,Q^*$,ガウス確率緩和,ガウス先行の表層パラメトリションに対する最適行動選択の正確な後部確率を算出した。 Deep Seaベンチマークの変種に関する数値的研究は、我々の発見を検証している。我々は,我々のフレームワークが不確実性を忠実に定量化し,他の時間差に基づくベイズ手法と比較して,よりデータ効率が高いことを示した。私たちは将来の仕事の推薦で締めくくります。

論文の概要: Bayesian learning for the stochastic shortest path problem

関連論文リスト