Gaussian credible intervals in Bayesian nonparametric estimation of the unseen
- URL: http://arxiv.org/abs/2501.16008v1
- Date: Mon, 27 Jan 2025 12:48:05 GMT
- Title: Gaussian credible intervals in Bayesian nonparametric estimation of the unseen
- Authors: Claudia Contardi, Emanuele Dolera, Stefano Favaro,
- Abstract summary: unseen-species problem assumes $ngeq1$ samples from a population of individuals belonging to different species, possibly infinite.
We propose a novel methodology to derive large $m$ credible intervals for $K_n,m$, for any $ngeq1$.
- Score: 7.54430260415628
- License:
- Abstract: The unseen-species problem assumes $n\geq1$ samples from a population of individuals belonging to different species, possibly infinite, and calls for estimating the number $K_{n,m}$ of hitherto unseen species that would be observed if $m\geq1$ new samples were collected from the same population. This is a long-standing problem in statistics, which has gained renewed relevance in biological and physical sciences, particularly in settings with large values of $n$ and $m$. In this paper, we adopt a Bayesian nonparametric approach to the unseen-species problem under the Pitman-Yor prior, and propose a novel methodology to derive large $m$ asymptotic credible intervals for $K_{n,m}$, for any $n\geq1$. By leveraging a Gaussian central limit theorem for the posterior distribution of $K_{n,m}$, our method improves upon competitors in two key aspects: firstly, it enables the full parameterization of the Pitman-Yor prior, including the Dirichlet prior; secondly, it avoids the need of Monte Carlo sampling, enhancing computational efficiency. We validate the proposed method on synthetic and real data, demonstrating that it improves the empirical performance of competitors by significantly narrowing the gap between asymptotic and exact credible intervals for any $m\geq1$.
Related papers
- Convergence Rate Analysis of LION [54.28350823319057]
LION converges iterations of $cal(sqrtdK-)$ measured by gradient Karush-Kuhn-T (sqrtdK-)$.
We show that LION can achieve lower loss and higher performance compared to standard SGD.
arXiv Detail & Related papers (2024-11-12T11:30:53Z) - Quasi-Bayesian sequential deconvolution [7.10052009802944]
We develop a principled sequential approach to estimate $f$ in a streaming or online domain.
Local and uniform Gaussian central limit theorems for $f_n$ are established, leading to credible intervals and bands for $f$.
An empirical validation of our methods is presented on synthetic and real data.
arXiv Detail & Related papers (2024-08-26T16:40:04Z) - Optimal score estimation via empirical Bayes smoothing [13.685846094715364]
We study the problem of estimating the score function of an unknown probability distribution $rho*$ from $n$ independent and identically distributed observations in $d$ dimensions.
We show that a regularized score estimator based on a Gaussian kernel attains this rate, shown optimal by a matching minimax lower bound.
arXiv Detail & Related papers (2024-02-12T16:17:40Z) - Online non-parametric likelihood-ratio estimation by Pearson-divergence
functional minimization [55.98760097296213]
We introduce a new framework for online non-parametric LRE (OLRE) for the setting where pairs of iid observations $(x_t sim p, x'_t sim q)$ are observed over time.
We provide theoretical guarantees for the performance of the OLRE method along with empirical validation in synthetic experiments.
arXiv Detail & Related papers (2023-11-03T13:20:11Z) - Robust Linear Predictions: Analyses of Uniform Concentration, Fast Rates
and Model Misspecification [16.0817847880416]
We offer a unified framework that includes a broad variety of linear prediction problems on a Hilbert space.
We show that for misspecification level $epsilon$, these estimators achieve an error rate of $O(maxleft|mathcalO|1/2n-1/2, |mathcalI|1/2n-1 right+epsilon)$, matching the best-known rates in literature.
arXiv Detail & Related papers (2022-01-06T08:51:08Z) - Under-bagging Nearest Neighbors for Imbalanced Classification [63.026765294759876]
We propose an ensemble learning algorithm called textitunder-bagging $k$-NN (textitunder-bagging $k$-NN) for imbalanced classification problems.
arXiv Detail & Related papers (2021-09-01T14:10:38Z) - Covariance-Aware Private Mean Estimation Without Private Covariance Estimation [10.036088581191592]
We present two sample-efficient differentially private mean estimators for $d$-dimensional (sub)Gaussian distributions.
Our estimators output $tildemu$ such that $| tildemu - mu |_Sigma leq alpha$, where $| cdot |_Sigma$ is the Mahalanobis distance.
arXiv Detail & Related papers (2021-06-24T21:40:07Z) - Optimal Sub-Gaussian Mean Estimation in $\mathbb{R}$ [5.457150493905064]
We present a novel estimator with sub-Gaussian convergence.
Our estimator does not require prior knowledge of the variance.
Our estimator construction and analysis gives a framework generalizable to other problems.
arXiv Detail & Related papers (2020-11-17T02:47:24Z) - Private Stochastic Non-Convex Optimization: Adaptive Algorithms and
Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization.
We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z) - Learning Halfspaces with Tsybakov Noise [50.659479930171585]
We study the learnability of halfspaces in the presence of Tsybakov noise.
We give an algorithm that achieves misclassification error $epsilon$ with respect to the true halfspace.
arXiv Detail & Related papers (2020-06-11T14:25:02Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.