Fugu-MT 論文翻訳(概要): Data-driven robust Markov decision processes on Borel spaces: performance guarantees via an axiomatic approach

論文の概要: Data-driven robust Markov decision processes on Borel spaces: performance guarantees via an axiomatic approach

arxiv url: http://arxiv.org/abs/2603.08979v1
Date: Mon, 09 Mar 2026 22:13:38 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-11 15:25:23.851007
Title: Data-driven robust Markov decision processes on Borel spaces: performance guarantees via an axiomatic approach
Title（参考訳）: ボレル空間上のロバストなマルコフ決定過程:公理的アプローチによる性能保証
Authors: Sivaramakrishnan Ramani,
Abstract要約: 乱れ分布が未知のマルコフ決定過程(MDP)を考察する。我々はロバストなマルコフ決定プロセス(RMDP)を用いてこの問題に対処する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider Markov decision processes (MDPs) with unknown disturbance distribution and address this problem using the robust Markov decision process (RMDP) approach. We construct the empirical distribution of the unknown disturbance distribution and characterize our ambiguity set of distributions as the sublevel set of a nonnegative distance function from the empirical distribution. By connecting the weak convergence of distributions to convergence with respect to the distance function, we prove that the robust optimal value function and the out-of-sample value function converge to the true optimal value function with increasing sample-sizes. We establish that, for finite sample-sizes, the robust optimal value function serves as a high probability upper bound on the out-of-sample value function. We also obtain probabilistic convergence rates, sample complexity bounds, and out-of-distribution performance bounds. The finite sample performance guarantees rely on the distance function satisfying a certain concentration type inequality. Several well-studied distances in the literature meet the requirements imposed on the distance function. We also analyze the data-driven properties of empirical MDPs and demonstrate that, unlike our data-driven RMDPs, empirical MDPs fail to satisfy some of the finite sample performance guarantees.
Abstract（参考訳）: 我々は、乱れ分布が未知なマルコフ決定過程(MDP)を考察し、ロバストなマルコフ決定過程(RMDP)アプローチを用いてこの問題に対処する。未知の外乱分布の実験的分布を構築し、その分布のあいまいさ集合を経験分布から非負距離関数の下位レベル集合として特徴付ける。分布の弱収束を距離関数に対する収束に結合することにより、ロバストな最適値関数と外サンプル値関数が標本サイズの増大とともに真の最適値関数に収束することを証明する。有限個のサンプルサイズに対して、ロバストな最適値関数は、サンプル外値関数上の高い確率上界として機能することを示す。また,確率収束率,サンプル複雑性境界,アウト・オブ・ディストリビューション性能境界も取得する。有限サンプル性能は、特定の濃度型不等式を満たす距離関数に依存する。文学におけるいくつかのよく研究された距離は、距離関数に課される要件を満たす。また、経験的MDPのデータ駆動特性を分析し、データ駆動型RMDPとは異なり、経験的MDPは限られたサンプル性能保証を満たさないことを示した。

論文の概要: Data-driven robust Markov decision processes on Borel spaces: performance guarantees via an axiomatic approach

関連論文リスト