Related papers: Monitoring State Transitions in Markovian Systems with Sampling Cost

Monitoring State Transitions in Markovian Systems with Sampling Cost

URL: http://arxiv.org/abs/2510.22327v1
Date: Sat, 25 Oct 2025 15:07:37 GMT
Title: Monitoring State Transitions in Markovian Systems with Sampling Cost
Authors: Kumar Saurav, Ness B. Shroff, Yingbin Liang,
Abstract summary: A natural approach is a greedy policy that predicts when the expected prediction loss is below the query cost and queries otherwise.<n>We analyze this policy in a Markovian setting, where the optimal (OPT) strategy is a state-dependent threshold policy.<n>For the case of unknown transition probabilities, we propose a projected gradient descent (PSGD)-based learning variant of the greedy policy.
Score: 65.4151496405543
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider a node-monitor pair, where the node's state varies with time. The monitor needs to track the node's state at all times; however, there is a fixed cost for each state query. So the monitor may instead predict the state using time-series forecasting methods, including time-series foundation models (TSFMs), and query only when prediction uncertainty is high. Since query decisions influence prediction accuracy, determining when to query is nontrivial. A natural approach is a greedy policy that predicts when the expected prediction loss is below the query cost and queries otherwise. We analyze this policy in a Markovian setting, where the optimal (OPT) strategy is a state-dependent threshold policy minimizing the time-averaged sum of query cost and prediction losses. We show that, in general, the greedy policy is suboptimal and can have an unbounded competitive ratio, but under common conditions such as identically distributed transition probabilities, it performs close to OPT. For the case of unknown transition probabilities, we further propose a projected stochastic gradient descent (PSGD)-based learning variant of the greedy policy, which achieves a favorable predict-query tradeoff with improved computational efficiency compared to OPT.

Related papers

Deadline-Aware Online Scheduling for LLM Fine-Tuning with Spot Market Predictions [11.849924812127371]
We show the power of prediction in enabling cost-efficient scheduling and its sensitivity to estimation errors.<n>We propose an online allocation algorithm with prediction based on the committed horizon control approach.<n>An online policy selection algorithm is developed that learns the best policy from a pool constructed by varying the parameters of both algorithms.
arXiv Detail & Related papers (2025-12-24T05:47:27Z)
Stopping Rules for Stochastic Gradient Descent via Anytime-Valid Confidence Sequences [51.56484100374058]
We study stopping rules for gradient descent (SGD) for convex optimization.<n>We develop an anytime-valid, data-dependent upper confidence sequence for the weighted average suboptimality of projected SGD.<n>These are the first rigorous, time-uniform performance guarantees and finitetime $varepsilon$-optimality certificates.
arXiv Detail & Related papers (2025-12-15T09:26:45Z)
Distribution-informed Online Conformal Prediction [53.674678995825666]
We propose Conformal Optimistic Prediction (COP), an online conformal prediction algorithm incorporating underlying data pattern into the update rule.<n>COP produces tighter prediction sets when predictable pattern exists, while retaining valid coverage guarantees even when estimates are inaccurate.<n>We prove that COP can achieve valid coverage and construct shorter prediction intervals than other baselines.
arXiv Detail & Related papers (2025-12-08T17:51:49Z)
Best-Effort Policies for Robust Markov Decision Processes [69.60742680559788]
We study the common generalization of Markov decision processes (MDPs) with sets of transition probabilities, known as robust MDPs (RMDPs)<n>We call such a policy an optimal robust best-effort (ORBE) policy.<n>We prove that ORBE policies always exist, characterize their structure, and present an algorithm to compute them with a small overhead compared to standard robust value iteration.
arXiv Detail & Related papers (2025-08-11T09:18:34Z)
Optimal Conformal Prediction under Epistemic Uncertainty [61.46247583794497]
Conformal prediction (CP) is a popular framework for representing uncertainty.<n>We introduce Bernoulli prediction sets (BPS) which produce the smallest prediction sets that ensure conditional coverage.<n>When given first-order predictions, BPS reduces to the well-known adaptive prediction sets (APS)
arXiv Detail & Related papers (2025-05-25T08:32:44Z)
Regression-Based Estimation of Causal Effects in the Presence of Selection Bias and Confounding [52.1068936424622]
We consider the problem of estimating the expected causal effect $E[Y|do(X)]$ for a target variable $Y$ when treatment $X$ is set by intervention.<n>In settings without selection bias or confounding, $E[Y|do(X)] = E[Y|X]$, which can be estimated using standard regression methods.<n>We propose a framework that incorporates both selection bias and confounding.
arXiv Detail & Related papers (2025-03-26T13:43:37Z)
Kernel-based Optimally Weighted Conformal Prediction Intervals [12.814084012624916]
We present Kernel-based Optimally Weighted Conformal Prediction Intervals (KOWCPI)<n>KOWCPI adapts the classic Reweighted Nadaraya-Watson (RNW) estimator for quantile regression on dependent data and learns optimal data-adaptive weights.<n>We demonstrate the superior performance of KOWCPI on real and synthetic time-series data against state-of-the-art methods.
arXiv Detail & Related papers (2024-05-27T04:49:41Z)
Efficient and Sharp Off-Policy Evaluation in Robust Markov Decision Processes [44.974100402600165]
We study the evaluation of a policy best-parametric and worst-case perturbations to a decision process (MDP) We use transition observations from the original MDP, whether they are generated under the same or a different policy. Our estimator is also estimated statistical inference using Wald confidence intervals.
arXiv Detail & Related papers (2024-03-29T18:11:49Z)
Post-selection Inference for Conformal Prediction: Trading off Coverage for Precision [0.0]
Traditionally, conformal prediction inference requires a data-independent specification of miscoverage level. We develop simultaneous conformal inference to account for data-dependent miscoverage levels.
arXiv Detail & Related papers (2023-04-12T20:56:43Z)
Planning in Observable POMDPs in Quasipolynomial Time [21.03037504572896]
We develop a quasipolynomial-time algorithm for planning in observable POMDPs. We assume that well-separated distributions on states lead to well-separated distributions on observations. We prove matching hardness for planning in observable POMDPs under the Exponential Time Hypothesis.
arXiv Detail & Related papers (2022-01-12T23:16:37Z)
Universal Off-Policy Evaluation [64.02853483874334]
We take the first steps towards a universal off-policy estimator (UnO) We use UnO for estimating and simultaneously bounding the mean, variance, quantiles/median, inter-quantile range, CVaR, and the entire cumulative distribution of returns.
arXiv Detail & Related papers (2021-04-26T18:54:31Z)
Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies [80.42316902296832]
We study the estimation of policy value and gradient of a deterministic policy from off-policy data when actions are continuous. In this setting, standard importance sampling and doubly robust estimators for policy value and gradient fail because the density ratio does not exist. We propose several new doubly robust estimators based on different kernelization approaches.
arXiv Detail & Related papers (2020-06-06T15:52:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.