Related papers: Just Wing It: Optimal Estimation of Missing Mass in a Markovian Sequence

Just Wing It: Optimal Estimation of Missing Mass in a Markovian Sequence

URL: http://arxiv.org/abs/2404.05819v1
Date: Mon, 8 Apr 2024 18:55:07 GMT
Title: Just Wing It: Optimal Estimation of Missing Mass in a Markovian Sequence
Authors: Ashwin Pananjady, Vidya Muthukumar, Andrew Thangaraj,
Abstract summary: We develop a linear-runtime estimator called emphWindowed Good--Turing (textscWingIt) We show that its risk decays as $widetildemathcalO(mathsfT_mix/n)$, where $mathsfT_mix$ denotes the mixing time of the chain in total variation distance. We extend our estimator to approximate the stationary mass placed on elements occurring with small frequency in $Xn$.
Score: 13.552044856329648
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the problem of estimating the stationary mass -- also called the unigram mass -- that is missing from a single trajectory of a discrete-time, ergodic Markov chain. This problem has several applications -- for example, estimating the stationary missing mass is critical for accurately smoothing probability estimates in sequence models. While the classical Good--Turing estimator from the 1950s has appealing properties for i.i.d. data, it is known to be biased in the Markov setting, and other heuristic estimators do not come equipped with guarantees. Operating in the general setting in which the size of the state space may be much larger than the length $n$ of the trajectory, we develop a linear-runtime estimator called \emph{Windowed Good--Turing} (\textsc{WingIt}) and show that its risk decays as $\widetilde{\mathcal{O}}(\mathsf{T_{mix}}/n)$, where $\mathsf{T_{mix}}$ denotes the mixing time of the chain in total variation distance. Notably, this rate is independent of the size of the state space and minimax-optimal up to a logarithmic factor in $n / \mathsf{T_{mix}}$. We also present a bound on the variance of the missing mass random variable, which may be of independent interest. We extend our estimator to approximate the stationary mass placed on elements occurring with small frequency in $X^n$. Finally, we demonstrate the efficacy of our estimators both in simulations on canonical chains and on sequences constructed from a popular natural language corpus.

Related papers

Near-Optimal Clustering in Mixture of Markov Chains [74.3828414695655]
We study the problem of clustering $T$ trajectories of length $H$, each generated by one of $K$ unknown ergodic Markov chains over a finite state space of size $S$.<n>We derive an instance-dependent, high-probability lower bound on the clustering error rate, governed by the weighted KL divergence between the transition kernels of the chains.<n>We then present a novel two-stage clustering algorithm.
arXiv Detail & Related papers (2025-06-02T05:10:40Z)
Estimating stationary mass, frequency by frequency [11.476508212290275]
We consider the problem of estimating the probability mass placed by the stationary distribution of any $alpha$-mixing process. We estimate this vector of probabilities in total variation distance, showing universal consistency in $n$. We develop complementary tools -- including concentration inequalities for a natural self-normalized Poisson mixing sequences -- that may prove independently useful in the design and analysis of estimators for related problems.
arXiv Detail & Related papers (2025-03-17T04:24:21Z)
Non-Gaussianities in Collider Metric Binning [0.0]
Metrics for rigorously defining a distance between two events have been used to study the properties of the dataspace manifold of particle collider physics. We define a robust measure of the non-Gaussianity of the bin-by-bin statistics of the distance distribution. We demonstrate in simulated data of jets from quantum chromodynamics sensitivity to the parton-to-hadron transition and that the manifold of events enjoys enhanced symmetries as their energy increases.
arXiv Detail & Related papers (2025-03-05T19:00:00Z)
Complexity Analysis of Normalizing Constant Estimation: from Jarzynski Equality to Annealed Importance Sampling and beyond [28.931489333515618]
Given an unnormalized probability density $piproptomathrme-V$, estimating its normalizing constant $Z=int_mathbbRdmathrme-V(x)mathrmdx$ or free energy $F=-log Z$ is a crucial problem in Bayesian statistics, statistical mechanics, and machine learning. We propose a new normalizing constant estimation algorithm based on reverse diffusion samplers and establish a framework for analyzing its complexity.
arXiv Detail & Related papers (2025-02-07T00:05:28Z)
Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate. We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z)
Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise [51.87307904567702]
Quantile regression is a leading approach for obtaining such intervals via the empirical estimation of quantiles in the distribution of outputs. We propose Relaxed Quantile Regression (RQR), a direct alternative to quantile regression based interval construction that removes this arbitrary constraint. We demonstrate that this added flexibility results in intervals with an improvement in desirable qualities.
arXiv Detail & Related papers (2024-06-05T13:36:38Z)
Intrinsic Bayesian Cramér-Rao Bound with an Application to Covariance Matrix Estimation [49.67011673289242]
This paper presents a new performance bound for estimation problems where the parameter to estimate lies in a smooth manifold. It induces a geometry for the parameter manifold, as well as an intrinsic notion of the estimation error measure.
arXiv Detail & Related papers (2023-11-08T15:17:13Z)
Kernel-based off-policy estimation without overlap: Instance optimality beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data. For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z)
Robust computation of optimal transport by $\eta$-potential regularization [79.24513412588745]
Optimal transport (OT) has become a widely used tool in the machine learning field to measure the discrepancy between probability distributions. We propose regularizing OT with the beta-potential term associated with the so-called $beta$-divergence. We experimentally demonstrate that the transport matrix computed with our algorithm helps estimate a probability distribution robustly even in the presence of outliers.
arXiv Detail & Related papers (2022-12-26T18:37:28Z)
Tangent Space and Dimension Estimation with the Wasserstein Distance [10.118241139691952]
Consider a set of points sampled independently near a smooth compact submanifold of Euclidean space. We provide mathematically rigorous bounds on the number of sample points required to estimate both the dimension and the tangent spaces of that manifold.
arXiv Detail & Related papers (2021-10-12T21:02:06Z)
Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process. We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator. We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z)
Large Non-Stationary Noisy Covariance Matrices: A Cross-Validation Approach [1.90365714903665]
We introduce a novel covariance estimator that exploits the heteroscedastic nature of financial time series. By attenuating the noise from both the cross-sectional and time-series dimensions, we empirically demonstrate the superiority of our estimator over competing estimators.
arXiv Detail & Related papers (2020-12-10T15:41:17Z)
Random quantum circuits anti-concentrate in log depth [118.18170052022323]
We study the number of gates needed for the distribution over measurement outcomes for typical circuit instances to be anti-concentrated. Our definition of anti-concentration is that the expected collision probability is only a constant factor larger than if the distribution were uniform. In both the case where the gates are nearest-neighbor on a 1D ring and the case where gates are long-range, we show $O(n log(n)) gates are also sufficient.
arXiv Detail & Related papers (2020-11-24T18:44:57Z)
Spectral statistics in constrained many-body quantum chaotic systems [0.0]
We study the spectral statistics of spatially-extended many-body quantum systems with on-site Abelian symmetries or local constraints. In particular, we analytically argue that in a system of length $L$ that conserves the $mth$ multipole moment, $t_mathrmTh$ scales subdiffusively as $L2(m+1)$.
arXiv Detail & Related papers (2020-09-24T17:59:57Z)
A metric on directed graphs and Markov chains based on hitting probabilities [0.0]
We introduce a metric on the state space of any ergodic, finite-state, time-homogeneous Markov chain. Our construction is based on hitting probabilities, with nearness in the metric space related to the transfer of random walkers from one node to another at stationarity. Notably, our metric is insensitive to shortest and average walk distances, thus giving new information compared to existing metrics.
arXiv Detail & Related papers (2020-06-25T15:25:05Z)
Empirical and Instance-Dependent Estimation of Markov Chain and Mixing Time [5.167069404528051]
We address the problem of estimating the mixing time of a Markov chain from a single trajectory of observations. Unlike most previous works which employed Hilbert space methods to estimate spectral gaps, we opt for an approach based on contraction with respect to total variation. This quantity, unlike the spectral gap, controls the mixing time up to strong universal constants and remains applicable to non-reversible chains.
arXiv Detail & Related papers (2019-12-14T13:38:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.