Related papers: A Covering Framework for Offline POMDPs Learning using Belief Space Metric

A Covering Framework for Offline POMDPs Learning using Belief Space Metric

URL: http://arxiv.org/abs/2603.03191v1
Date: Tue, 03 Mar 2026 17:48:20 GMT
Title: A Covering Framework for Offline POMDPs Learning using Belief Space Metric
Authors: Youheng Zhu, Yiping Lu,
Abstract summary: This paper introduces a novel covering analysis framework that exploits the intrinsic metric structure of the belief space.<n>By assuming value relevant functions are Lipschitz continuous in the belief space, we derive error bounds that exponential blow ups in horizon and memory length.
Score: 3.540245474029962
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In off policy evaluation (OPE) for partially observable Markov decision processes (POMDPs), an agent must infer hidden states from past observations, which exacerbates both the curse of horizon and the curse of memory in existing OPE methods. This paper introduces a novel covering analysis framework that exploits the intrinsic metric structure of the belief space (distributions over latent states) to relax traditional coverage assumptions. By assuming value relevant functions are Lipschitz continuous in the belief space, we derive error bounds that mitigate exponential blow ups in horizon and memory length. Our unified analysis technique applies to a broad class of OPE algorithms, yielding concrete error bounds and coverage requirements expressed in terms of belief space metrics rather than raw history coverage. We illustrate the improved sample efficiency of this framework via case studies: the double sampling Bellman error minimization algorithm, and the memory based future dependent value functions (FDVF). In both cases, our coverage definition based on the belief space metric yields tighter bounds.

Related papers

Causal Imitation Learning Under Measurement Error and Distribution Shift [6.038778620145853]
We study offline imitation learning (IL) when part of the decision-relevant state is observed only through noisy measurements.<n>We propose a general framework for IL under measurement error, inspired by explicitly modeling the causal relationships among the variables.
arXiv Detail & Related papers (2026-01-29T18:06:53Z)
A Unifying View of Coverage in Linear Off-Policy Evaluation [36.79977028763131]
We provide a novel finite-sample analysis of a canonical algorithm for this setting, LSTDQ.<n>Inspired by an instrumental-variable view, we develop error bounds that depend on a novel coverage parameter, the feature-dynamics coverage.
arXiv Detail & Related papers (2026-01-26T23:30:24Z)
Finite Memory Belief Approximation for Optimal Control in Partially Observable Markov Decision Processes [1.614301262383079]
We study finite memory belief approximation for partially observable (PO) optimal control (SOC) problems.<n>We develop a metric-based theory that directly relates information loss to control performance.
arXiv Detail & Related papers (2026-01-06T16:05:20Z)
Conditional Coverage Diagnostics for Conformal Prediction [47.93989136542648]
We show that conditional coverage estimation can be a classification problem.<n>We call the resulting family of metrics excess risk of the target coverage (ERT)<n>We release an open-source package for ERT as well as previous conditional coverage metrics.
arXiv Detail & Related papers (2025-12-12T18:47:39Z)
Anti-Collapse Loss for Deep Metric Learning Based on Coding Rate Metric [99.19559537966538]
DML aims to learn a discriminative high-dimensional embedding space for downstream tasks like classification, clustering, and retrieval. To maintain the structure of embedding space and avoid feature collapse, we propose a novel loss function called Anti-Collapse Loss. Comprehensive experiments on benchmark datasets demonstrate that our proposed method outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2024-07-03T13:44:20Z)
On the Curses of Future and History in Future-dependent Value Functions for Off-policy Evaluation [11.829110453985228]
We develop estimators whose guarantee avoids exponential dependence on the horizon. In this paper, we discover novel coverage assumptions tailored to the structure of POMDPs. As a side product, our analyses also lead to the discovery of new algorithms with complementary properties.
arXiv Detail & Related papers (2024-02-22T17:00:50Z)
Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes [65.91730154730905]
In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors. Here we tackle this by considering off-policy evaluation in a partially observed Markov decision process (POMDP) We extend the framework of proximal causal inference to our POMDP setting, providing a variety of settings where identification is made possible.
arXiv Detail & Related papers (2021-10-28T17:46:14Z)
Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders [62.54431888432302]
We study an OPE problem in an infinite-horizon, ergodic Markov decision process with unobserved confounders. We show how, given only a latent variable model for states and actions, policy value can be identified from off-policy data.
arXiv Detail & Related papers (2020-07-27T22:19:01Z)
Towards Certified Robustness of Distance Metric Learning [53.96113074344632]
We advocate imposing an adversarial margin in the input space so as to improve the generalization and robustness of metric learning algorithms. We show that the enlarged margin is beneficial to the generalization ability by using the theoretical technique of algorithmic robustness.
arXiv Detail & Related papers (2020-06-10T16:51:53Z)
Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation [49.502277468627035]
This paper studies the statistical theory of batch data reinforcement learning with function approximation. Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history.
arXiv Detail & Related papers (2020-02-21T19:20:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.