The Geometry of Robust Value Functions
- URL: http://arxiv.org/abs/2201.12929v1
- Date: Sun, 30 Jan 2022 22:12:17 GMT
- Title: The Geometry of Robust Value Functions
- Authors: Kaixin Wang, Navdeep Kumar, Kuangqi Zhou, Bryan Hooi, Jiashi Feng,
Shie Mannor
- Abstract summary: We introduce a new perspective that enables us to characterize both the non-robust and robust value space.
We show that the robust value space is determined by a set conic hypersurfaces, each which contains the robust values of all policies that agree on one state.
- Score: 119.94715309072983
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The space of value functions is a fundamental concept in reinforcement
learning. Characterizing its geometric properties may provide insights for
optimization and representation. Existing works mainly focus on the value space
for Markov Decision Processes (MDPs). In this paper, we study the geometry of
the robust value space for the more general Robust MDPs (RMDPs) setting, where
transition uncertainties are considered. Specifically, since we find it hard to
directly adapt prior approaches to RMDPs, we start with revisiting the
non-robust case, and introduce a new perspective that enables us to
characterize both the non-robust and robust value space in a similar fashion.
The key of this perspective is to decompose the value space, in a state-wise
manner, into unions of hypersurfaces. Through our analysis, we show that the
robust value space is determined by a set of conic hypersurfaces, each of which
contains the robust values of all policies that agree on one state.
Furthermore, we find that taking only extreme points in the uncertainty set is
sufficient to determine the robust value space. Finally, we discuss some other
aspects about the robust value space, including its non-convexity and policy
agreement on multiple states.
Related papers
- CWF: Consolidating Weak Features in High-quality Mesh Simplification [50.634070540791555]
We propose a smooth functional that simultaneously considers all of these requirements.
The functional comprises a normal anisotropy term and a Centroidal Voronoi Tessellation (CVT) energy term.
arXiv Detail & Related papers (2024-04-24T05:37:17Z) - Efficient and Sharp Off-Policy Evaluation in Robust Markov Decision Processes [44.974100402600165]
We study the evaluation of a policy best-parametric and worst-case perturbations to a decision process (MDP)
We use transition observations from the original MDP, whether they are generated under the same or a different policy.
Our estimator is also estimated statistical inference using Wald confidence intervals.
arXiv Detail & Related papers (2024-03-29T18:11:49Z) - Double Duality: Variational Primal-Dual Policy Optimization for
Constrained Reinforcement Learning [132.7040981721302]
We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure.
Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
arXiv Detail & Related papers (2024-02-16T16:35:18Z) - Neighbor-Aware Calibration of Segmentation Networks with Penalty-Based
Constraints [19.897181782914437]
We propose a principled and simple solution based on equality constraints on the logit values, which enables to control explicitly both the enforced constraint and the weight of the penalty.
Our approach can be used to train a wide span of deep segmentation networks.
arXiv Detail & Related papers (2024-01-25T19:46:57Z) - Trust your neighbours: Penalty-based constraints for model calibration [19.437451462590108]
We present a constrained optimization perspective of SVLS and demonstrate that it enforces an implicit constraint on soft class proportions of surrounding pixels.
We propose a principled and simple solution based on equality constraints on the logit values, which enables to control explicitly both the enforced constraint and the weight of the penalty.
arXiv Detail & Related papers (2023-03-11T01:10:26Z) - Computationally Efficient PAC RL in POMDPs with Latent Determinism and
Conditional Embeddings [97.12538243736705]
We study reinforcement learning with function approximation for large-scale Partially Observable Decision Processes (POMDPs)
Our algorithm provably scales to large-scale POMDPs.
arXiv Detail & Related papers (2022-06-24T05:13:35Z) - Geometric Policy Iteration for Markov Decision Processes [4.746723775952672]
Recently discovered polyhedral structures of the value function for finite state-action discounted Markov decision processes (MDP) shed light on understanding the success of reinforcement learning.
We propose a new algorithm, emphGeometric Policy Iteration, to solve discounted MDPs.
arXiv Detail & Related papers (2022-06-12T18:15:24Z) - Spatial and Semantic Consistency Regularizations for Pedestrian
Attribute Recognition [50.932864767867365]
We propose a framework that consists of two complementary regularizations to achieve spatial and semantic consistency for each attribute.
Based on the precise attribute locations, we propose a semantic consistency regularization to extract intrinsic and discriminative semantic features.
Results show that the proposed method performs favorably against state-of-the-art methods without increasing parameters.
arXiv Detail & Related papers (2021-09-13T03:36:44Z) - Near Optimality of Finite Memory Feedback Policies in Partially Observed
Markov Decision Processes [0.0]
We study a planning problem for POMDPs where the system dynamics and measurement channel model is assumed to be known.
We find optimal policies for the approximate belief model under mild non-linear filter stability conditions.
We also establish a rate of convergence result which relates the finite window memory size and the approximation error bound.
arXiv Detail & Related papers (2020-10-15T00:37:51Z) - Provably Efficient Safe Exploration via Primal-Dual Policy Optimization [105.7510838453122]
We study the Safe Reinforcement Learning (SRL) problem using the Constrained Markov Decision Process (CMDP) formulation.
We present an provably efficient online policy optimization algorithm for CMDP with safe exploration in the function approximation setting.
arXiv Detail & Related papers (2020-03-01T17:47:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.