Robust Parameter Learning for Uncertain MDPs
Abstract Overview
This paper proposes a framework for learning uncertain MDPs (UMDPs) by projecting statistical uncertainty from empirical transition frequencies into the parameter space of a known parametric MDP (pMDP). Rather than treating transition probabilities as independent, the method exploits algebraic dependencies captured by shared parameters to construct a confidence region in parameter space that contains the true parameter instantiation with probability at least 1−δ. Because the resulting coupled uncertainty set is computationally challenging to solve, the authors develop a hierarchy of sound rectangular and linear outer relaxations that enable tractable robust policy synthesis. The framework also identifies when the learned constraints are jointly inconsistent with the assumed parametric structure, providing a model misspecification diagnostic.
Novelty
The key contribution is learning uncertain MDPs through the parameter space of a pMDP rather than constructing independent confidence intervals for individual transition probabilities, thereby preserving algebraic dependencies across transitions. The paper also introduces a formal hierarchy of sound relaxations—rectangular, expression-wise, and parameter-wise projections, together with McCormick-based linearizations for polynomial constraints—that make the structurally coupled uncertainty tractable for robust synthesis.
Results
The authors prove that the projected parameter uncertainty region contains the true parameter instantiation with probability at least 1−δ, ensuring robust values are PAC lower bounds on true performance. Experimentally, exploiting parametric structure yields substantially tighter uncertainty estimates than interval-based learning with parameter tying, with the expression-wise projection often matching the tightest rectangular relaxation at much lower computational cost. In online learning experiments, the tighter uncertainty models improve certified sample efficiency by up to an order of magnitude over the baseline.
Key Points
- The method projects transition-level statistical uncertainty into pMDP parameter space, producing a high-confidence uncertainty region that respects algebraic dependencies across transitions and provides a PAC inclusion guarantee.
- To address intractability of the induced coupled uncertainty, the paper develops a hierarchy of sound relaxations (rectangular, expression-wise, parameter-wise) with proven inclusion relationships, and uses McCormick envelopes for linearizing polynomial constraints.
- Across benchmark evaluations, the expression-wise projection emerges as the strongest tightness-versus-runtime compromise, often matching the tightest rectangular relaxation while maintaining computational costs comparable to baseline interval learning.