Just Wing It: Optimal Estimation of Missing Mass in a Markovian Sequence
- URL: http://arxiv.org/abs/2404.05819v1
- Date: Mon, 8 Apr 2024 18:55:07 GMT
- Title: Just Wing It: Optimal Estimation of Missing Mass in a Markovian Sequence
- Authors: Ashwin Pananjady, Vidya Muthukumar, Andrew Thangaraj,
- Abstract summary: We develop a linear-runtime estimator called emphWindowed Good--Turing (textscWingIt)
We show that its risk decays as $widetildemathcalO(mathsfT_mix/n)$, where $mathsfT_mix$ denotes the mixing time of the chain in total variation distance.
We extend our estimator to approximate the stationary mass placed on elements occurring with small frequency in $Xn$.
- Score: 13.552044856329648
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of estimating the stationary mass -- also called the unigram mass -- that is missing from a single trajectory of a discrete-time, ergodic Markov chain. This problem has several applications -- for example, estimating the stationary missing mass is critical for accurately smoothing probability estimates in sequence models. While the classical Good--Turing estimator from the 1950s has appealing properties for i.i.d. data, it is known to be biased in the Markov setting, and other heuristic estimators do not come equipped with guarantees. Operating in the general setting in which the size of the state space may be much larger than the length $n$ of the trajectory, we develop a linear-runtime estimator called \emph{Windowed Good--Turing} (\textsc{WingIt}) and show that its risk decays as $\widetilde{\mathcal{O}}(\mathsf{T_{mix}}/n)$, where $\mathsf{T_{mix}}$ denotes the mixing time of the chain in total variation distance. Notably, this rate is independent of the size of the state space and minimax-optimal up to a logarithmic factor in $n / \mathsf{T_{mix}}$. We also present a bound on the variance of the missing mass random variable, which may be of independent interest. We extend our estimator to approximate the stationary mass placed on elements occurring with small frequency in $X^n$. Finally, we demonstrate the efficacy of our estimators both in simulations on canonical chains and on sequences constructed from a popular natural language corpus.
Related papers
- Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise [51.87307904567702]
Quantile regression is a leading approach for obtaining such intervals via the empirical estimation of quantiles in the distribution of outputs.
We propose Relaxed Quantile Regression (RQR), a direct alternative to quantile regression based interval construction that removes this arbitrary constraint.
We demonstrate that this added flexibility results in intervals with an improvement in desirable qualities.
arXiv Detail & Related papers (2024-06-05T13:36:38Z) - Dynamical System Identification, Model Selection and Model Uncertainty Quantification by Bayesian Inference [0.8388591755871735]
This study presents a Bayesian maximum textitaposteriori (MAP) framework for dynamical system identification from time-series data.
arXiv Detail & Related papers (2024-01-30T12:16:52Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Robust computation of optimal transport by $\beta$-potential
regularization [79.24513412588745]
Optimal transport (OT) has become a widely used tool in the machine learning field to measure the discrepancy between probability distributions.
We propose regularizing OT with the beta-potential term associated with the so-called $beta$-divergence.
We experimentally demonstrate that the transport matrix computed with our algorithm helps estimate a probability distribution robustly even in the presence of outliers.
arXiv Detail & Related papers (2022-12-26T18:37:28Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Estimating 2-Sinkhorn Divergence between Gaussian Processes from
Finite-Dimensional Marginals [4.416484585765028]
We study the convergence of estimating the 2-Sinkhorn divergence between emphGaussian processes (GPs) using their finite-dimensional marginal distributions.
We show almost sure convergence of the divergence when the marginals are sampled according to some base measure.
arXiv Detail & Related papers (2021-02-05T16:17:55Z) - Random quantum circuits anti-concentrate in log depth [118.18170052022323]
We study the number of gates needed for the distribution over measurement outcomes for typical circuit instances to be anti-concentrated.
Our definition of anti-concentration is that the expected collision probability is only a constant factor larger than if the distribution were uniform.
In both the case where the gates are nearest-neighbor on a 1D ring and the case where gates are long-range, we show $O(n log(n)) gates are also sufficient.
arXiv Detail & Related papers (2020-11-24T18:44:57Z) - Spectral statistics in constrained many-body quantum chaotic systems [0.0]
We study the spectral statistics of spatially-extended many-body quantum systems with on-site Abelian symmetries or local constraints.
In particular, we analytically argue that in a system of length $L$ that conserves the $mth$ multipole moment, $t_mathrmTh$ scales subdiffusively as $L2(m+1)$.
arXiv Detail & Related papers (2020-09-24T17:59:57Z) - A metric on directed graphs and Markov chains based on hitting
probabilities [0.0]
We introduce a metric on the state space of any ergodic, finite-state, time-homogeneous Markov chain.
Our construction is based on hitting probabilities, with nearness in the metric space related to the transfer of random walkers from one node to another at stationarity.
Notably, our metric is insensitive to shortest and average walk distances, thus giving new information compared to existing metrics.
arXiv Detail & Related papers (2020-06-25T15:25:05Z) - A Concentration of Measure and Random Matrix Approach to Large
Dimensional Robust Statistics [45.24358490877106]
This article studies the emphrobust covariance matrix estimation of a data collection $X = (x_1,ldots,x_n)$ with $x_i = sqrt tau_i z_i + m$.
We exploit this semi-metric along with concentration of measure arguments to prove the existence and uniqueness of the robust estimator as well as evaluate its limiting spectral distribution.
arXiv Detail & Related papers (2020-06-17T09:02:26Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.