Just Wing It: Near-Optimal Estimation of Missing Mass in a Markovian Sequence
- URL: http://arxiv.org/abs/2404.05819v2
- Date: Sat, 05 Oct 2024 16:02:10 GMT
- Title: Just Wing It: Near-Optimal Estimation of Missing Mass in a Markovian Sequence
- Authors: Ashwin Pananjady, Vidya Muthukumar, Andrew Thangaraj,
- Abstract summary: We develop a linear-runtime estimator called Windowed Good--Turing (WingIt)
We show that its risk decays as $widetildeO(mathsfT_mix/n)$, where $mathsfT_mix$ denotes the mixing time of the chain in total variation distance.
We extend our estimator to approximate the stationary mass placed on elements occurring with small frequency in the trajectory.
- Score: 13.552044856329648
- License:
- Abstract: We study the problem of estimating the stationary mass -- also called the unigram mass -- that is missing from a single trajectory of a discrete-time, ergodic Markov chain. This problem has several applications -- for example, estimating the stationary missing mass is critical for accurately smoothing probability estimates in sequence models. While the classical Good--Turing estimator from the 1950s has appealing properties for i.i.d. data, it is known to be biased in the Markovian setting, and other heuristic estimators do not come equipped with guarantees. Operating in the general setting in which the size of the state space may be much larger than the length $n$ of the trajectory, we develop a linear-runtime estimator called Windowed Good--Turing (WingIt) and show that its risk decays as $\widetilde{O}(\mathsf{T_{mix}}/n)$, where $\mathsf{T_{mix}}$ denotes the mixing time of the chain in total variation distance. Notably, this rate is independent of the size of the state space and minimax-optimal up to a logarithmic factor in $n / \mathsf{T_{mix}}$. We also present an upper bound on the variance of the missing mass random variable, which may be of independent interest. We extend our estimator to approximate the stationary mass placed on elements occurring with small frequency in the trajectory. Finally, we demonstrate the efficacy of our estimators both in simulations on canonical chains and on sequences constructed from natural language text.
Related papers
- Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise [51.87307904567702]
Quantile regression is a leading approach for obtaining such intervals via the empirical estimation of quantiles in the distribution of outputs.
We propose Relaxed Quantile Regression (RQR), a direct alternative to quantile regression based interval construction that removes this arbitrary constraint.
We demonstrate that this added flexibility results in intervals with an improvement in desirable qualities.
arXiv Detail & Related papers (2024-06-05T13:36:38Z) - Intrinsic Bayesian Cramér-Rao Bound with an Application to Covariance Matrix Estimation [49.67011673289242]
This paper presents a new performance bound for estimation problems where the parameter to estimate lies in a smooth manifold.
It induces a geometry for the parameter manifold, as well as an intrinsic notion of the estimation error measure.
arXiv Detail & Related papers (2023-11-08T15:17:13Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Tangent Space and Dimension Estimation with the Wasserstein Distance [10.118241139691952]
Consider a set of points sampled independently near a smooth compact submanifold of Euclidean space.
We provide mathematically rigorous bounds on the number of sample points required to estimate both the dimension and the tangent spaces of that manifold.
arXiv Detail & Related papers (2021-10-12T21:02:06Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Large Non-Stationary Noisy Covariance Matrices: A Cross-Validation
Approach [1.90365714903665]
We introduce a novel covariance estimator that exploits the heteroscedastic nature of financial time series.
By attenuating the noise from both the cross-sectional and time-series dimensions, we empirically demonstrate the superiority of our estimator over competing estimators.
arXiv Detail & Related papers (2020-12-10T15:41:17Z) - Random quantum circuits anti-concentrate in log depth [118.18170052022323]
We study the number of gates needed for the distribution over measurement outcomes for typical circuit instances to be anti-concentrated.
Our definition of anti-concentration is that the expected collision probability is only a constant factor larger than if the distribution were uniform.
In both the case where the gates are nearest-neighbor on a 1D ring and the case where gates are long-range, we show $O(n log(n)) gates are also sufficient.
arXiv Detail & Related papers (2020-11-24T18:44:57Z) - Spectral statistics in constrained many-body quantum chaotic systems [0.0]
We study the spectral statistics of spatially-extended many-body quantum systems with on-site Abelian symmetries or local constraints.
In particular, we analytically argue that in a system of length $L$ that conserves the $mth$ multipole moment, $t_mathrmTh$ scales subdiffusively as $L2(m+1)$.
arXiv Detail & Related papers (2020-09-24T17:59:57Z) - A metric on directed graphs and Markov chains based on hitting
probabilities [0.0]
We introduce a metric on the state space of any ergodic, finite-state, time-homogeneous Markov chain.
Our construction is based on hitting probabilities, with nearness in the metric space related to the transfer of random walkers from one node to another at stationarity.
Notably, our metric is insensitive to shortest and average walk distances, thus giving new information compared to existing metrics.
arXiv Detail & Related papers (2020-06-25T15:25:05Z) - Empirical and Instance-Dependent Estimation of Markov Chain and Mixing
Time [5.167069404528051]
We address the problem of estimating the mixing time of a Markov chain from a single trajectory of observations.
Unlike most previous works which employed Hilbert space methods to estimate spectral gaps, we opt for an approach based on contraction with respect to total variation.
This quantity, unlike the spectral gap, controls the mixing time up to strong universal constants and remains applicable to non-reversible chains.
arXiv Detail & Related papers (2019-12-14T13:38:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.