Offline Estimation of Controlled Markov Chains: Minimaxity and Sample
Complexity
- URL: http://arxiv.org/abs/2211.07092v4
- Date: Fri, 26 Jan 2024 20:23:18 GMT
- Title: Offline Estimation of Controlled Markov Chains: Minimaxity and Sample
Complexity
- Authors: Imon Banerjee, Harsha Honnappa, Vinayak Rao
- Abstract summary: We develop sample complexity bounds for the estimator and establish conditions for minimaxity.
We show that achieving a particular statistical risk bound involves a subtle and interesting trade-off between the strength of the mixing properties and the number of samples.
- Score: 8.732260277121547
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we study a natural nonparametric estimator of the transition
probability matrices of a finite controlled Markov chain. We consider an
offline setting with a fixed dataset, collected using a so-called logging
policy. We develop sample complexity bounds for the estimator and establish
conditions for minimaxity. Our statistical bounds depend on the logging policy
through its mixing properties. We show that achieving a particular statistical
risk bound involves a subtle and interesting trade-off between the strength of
the mixing properties and the number of samples. We demonstrate the validity of
our results under various examples, such as ergodic Markov chains, weakly
ergodic inhomogeneous Markov chains, and controlled Markov chains with
non-stationary Markov, episodic, and greedy controls. Lastly, we use these
sample complexity bounds to establish concomitant ones for offline evaluation
of stationary Markov control policies.
Related papers
- Hoeffding's Inequality for Markov Chains under Generalized
Concentrability Condition [15.228649445346473]
This paper studies Hoeffding's inequality for Markov chains under the generalized concentrability condition defined via integral probability metric (IPM)
The flexibility of our framework allows Hoeffding's inequality to be applied beyond the ergodic Markov chains in the traditional sense.
arXiv Detail & Related papers (2023-10-04T16:21:23Z) - A Tale of Sampling and Estimation in Discounted Reinforcement Learning [50.43256303670011]
We present a minimax lower bound on the discounted mean estimation problem.
We show that estimating the mean by directly sampling from the discounted kernel of the Markov process brings compelling statistical properties.
arXiv Detail & Related papers (2023-04-11T09:13:17Z) - Stochastic Gradient Descent under Markovian Sampling Schemes [3.04585143845864]
We study a variation of vanilla gradient descent where the only has access to a Markovian sampling scheme.
We focus on obtaining rates of convergence under the least restrictive assumptions possible on the underlying Markov chain.
arXiv Detail & Related papers (2023-02-28T09:18:00Z) - A Geometric Reduction Approach for Identity Testing of Reversible Markov
Chains [25.33133112984769]
We consider the problem of testing the identity of a reversible Markov chain against a reference from a single trajectory of observations.
We show that, at least in a mildly restricted setting, testing identity to a reversible chain reduces to testing to a symmetric chain over a larger state space.
arXiv Detail & Related papers (2023-02-16T03:41:39Z) - Breaking the Spurious Causality of Conditional Generation via Fairness
Intervention with Corrective Sampling [77.15766509677348]
Conditional generative models often inherit spurious correlations from the training dataset.
This can result in label-conditional distributions that are imbalanced with respect to another latent attribute.
We propose a general two-step strategy to mitigate this issue.
arXiv Detail & Related papers (2022-12-05T08:09:33Z) - Leveraging Instance Features for Label Aggregation in Programmatic Weak
Supervision [75.1860418333995]
Programmatic Weak Supervision (PWS) has emerged as a widespread paradigm to synthesize training labels efficiently.
The core component of PWS is the label model, which infers true labels by aggregating the outputs of multiple noisy supervision sources as labeling functions.
Existing statistical label models typically rely only on the outputs of LF, ignoring the instance features when modeling the underlying generative process.
arXiv Detail & Related papers (2022-10-06T07:28:53Z) - Semi-Supervised Clustering via Markov Chain Aggregation [9.475039534437332]
We introduce Constrained Markov Clustering (CoMaC) for semi-supervised clustering.
Our results indicate that CoMaC is competitive with the state-of-the-art.
arXiv Detail & Related papers (2021-12-17T09:07:43Z) - Comparison of Markov chains via weak Poincar\'e inequalities with
application to pseudo-marginal MCMC [0.0]
We investigate the use of a certain class of functional inequalities known as weak Poincar'e inequalities to bound convergence of Markov chains to equilibrium.
We show that this enables the derivation of subgeometric convergence bounds for methods such as the Independent Metropolis--Hastings sampler and pseudo-marginal methods for intractable likelihoods.
arXiv Detail & Related papers (2021-12-10T15:36:30Z) - Exploiting Sample Uncertainty for Domain Adaptive Person
Re-Identification [137.9939571408506]
We estimate and exploit the credibility of the assigned pseudo-label of each sample to alleviate the influence of noisy labels.
Our uncertainty-guided optimization brings significant improvement and achieves the state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2020-12-16T04:09:04Z) - GenDICE: Generalized Offline Estimation of Stationary Values [108.17309783125398]
We show that effective estimation can still be achieved in important applications.
Our approach is based on estimating a ratio that corrects for the discrepancy between the stationary and empirical distributions.
The resulting algorithm, GenDICE, is straightforward and effective.
arXiv Detail & Related papers (2020-02-21T00:27:52Z) - Targeted stochastic gradient Markov chain Monte Carlo for hidden Markov models with rare latent states [48.705095800341944]
Markov chain Monte Carlo (MCMC) algorithms for hidden Markov models often rely on the forward-backward sampler.
This makes them computationally slow as the length of the time series increases, motivating the development of sub-sampling-based approaches.
We propose a targeted sub-sampling approach that over-samples observations corresponding to rare latent states when calculating the gradient of parameters associated with them.
arXiv Detail & Related papers (2018-10-31T17:44:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.