Estimating stationary mass, frequency by frequency
- URL: http://arxiv.org/abs/2503.12808v2
- Date: Tue, 18 Mar 2025 03:07:51 GMT
- Title: Estimating stationary mass, frequency by frequency
- Authors: Milind Nakul, Vidya Muthukumar, Ashwin Pananjady,
- Abstract summary: We consider the problem of estimating the probability mass placed by the stationary distribution of any $alpha$-mixing process.<n>We estimate this vector of probabilities in total variation distance, showing universal consistency in $n$.<n>We develop complementary tools -- including concentration inequalities for a natural self-normalized Poisson mixing sequences -- that may prove independently useful in the design and analysis of estimators for related problems.
- Score: 11.476508212290275
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Suppose we observe a trajectory of length $n$ from an $\alpha$-mixing stochastic process over a finite but potentially large state space. We consider the problem of estimating the probability mass placed by the stationary distribution of any such process on elements that occur with a certain frequency in the observed sequence. We estimate this vector of probabilities in total variation distance, showing universal consistency in $n$ and recovering known results for i.i.d. sequences as special cases. Our proposed methodology carefully combines the plug-in (or empirical) estimator with a recently-proposed modification of the Good--Turing estimator called WingIt, which was originally developed for Markovian sequences. En route to controlling the error of our estimator, we develop new performance bounds on WingIt and the plug-in estimator for $\alpha$-mixing stochastic processes. Importantly, the extensively used method of Poissonization can no longer be applied in our non i.i.d. setting, and so we develop complementary tools -- including concentration inequalities for a natural self-normalized statistic of mixing sequences -- that may prove independently useful in the design and analysis of estimators for related problems.
Related papers
- Semiparametric conformal prediction [79.6147286161434]
We construct a conformal prediction set accounting for the joint correlation structure of the vector-valued non-conformity scores.<n>We flexibly estimate the joint cumulative distribution function (CDF) of the scores.<n>Our method yields desired coverage and competitive efficiency on a range of real-world regression problems.
arXiv Detail & Related papers (2024-11-04T14:29:02Z) - Quasi-Bayesian sequential deconvolution [7.10052009802944]
We develop a principled sequential approach to estimate $f$ in a streaming or online domain.<n>Local and uniform Gaussian central limit theorems for $f_n$ are established, leading to credible intervals and bands for $f$.<n>An empirical validation of our methods is presented on synthetic and real data.
arXiv Detail & Related papers (2024-08-26T16:40:04Z) - Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise [51.87307904567702]
Quantile regression is a leading approach for obtaining such intervals via the empirical estimation of quantiles in the distribution of outputs.<n>We propose Relaxed Quantile Regression (RQR), a direct alternative to quantile regression based interval construction that removes this arbitrary constraint.<n>We demonstrate that this added flexibility results in intervals with an improvement in desirable qualities.
arXiv Detail & Related papers (2024-06-05T13:36:38Z) - Just Wing It: Near-Optimal Estimation of Missing Mass in a Markovian Sequence [13.552044856329648]
We develop a linear-runtime estimator called Windowed Good--Turing (WingIt)
We show that its risk decays as $widetildeO(mathsfT_mix/n)$, where $mathsfT_mix$ denotes the mixing time of the chain in total variation distance.
We extend our estimator to approximate the stationary mass placed on elements occurring with small frequency in the trajectory.
arXiv Detail & Related papers (2024-04-08T18:55:07Z) - Robust computation of optimal transport by $\beta$-potential
regularization [79.24513412588745]
Optimal transport (OT) has become a widely used tool in the machine learning field to measure the discrepancy between probability distributions.
We propose regularizing OT with the beta-potential term associated with the so-called $beta$-divergence.
We experimentally demonstrate that the transport matrix computed with our algorithm helps estimate a probability distribution robustly even in the presence of outliers.
arXiv Detail & Related papers (2022-12-26T18:37:28Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Faster Wasserstein Distance Estimation with the Sinkhorn Divergence [0.0]
The squared Wasserstein distance is a quantity to compare probability distributions in a non-parametric setting.
In this work, we propose instead to estimate it with the Sinkhorn divergence.
We show that, for smooth densities, this estimator has a comparable sample complexity but allows higher regularization levels.
arXiv Detail & Related papers (2020-06-15T06:58:16Z) - Nonparametric Score Estimators [49.42469547970041]
Estimating the score from a set of samples generated by an unknown distribution is a fundamental task in inference and learning of probabilistic models.
We provide a unifying view of these estimators under the framework of regularized nonparametric regression.
We propose score estimators based on iterative regularization that enjoy computational benefits from curl-free kernels and fast convergence.
arXiv Detail & Related papers (2020-05-20T15:01:03Z) - Batch Stationary Distribution Estimation [98.18201132095066]
We consider the problem of approximating the stationary distribution of an ergodic Markov chain given a set of sampled transitions.
We propose a consistent estimator that is based on recovering a correction ratio function over the given data.
arXiv Detail & Related papers (2020-03-02T09:10:01Z) - Empirical and Instance-Dependent Estimation of Markov Chain and Mixing
Time [5.167069404528051]
We address the problem of estimating the mixing time of a Markov chain from a single trajectory of observations.
Unlike most previous works which employed Hilbert space methods to estimate spectral gaps, we opt for an approach based on contraction with respect to total variation.
This quantity, unlike the spectral gap, controls the mixing time up to strong universal constants and remains applicable to non-reversible chains.
arXiv Detail & Related papers (2019-12-14T13:38:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.