Unbiased and Efficient Log-Likelihood Estimation with Inverse Binomial
Sampling
- URL: http://arxiv.org/abs/2001.03985v3
- Date: Tue, 27 Oct 2020 20:08:25 GMT
- Title: Unbiased and Efficient Log-Likelihood Estimation with Inverse Binomial
Sampling
- Authors: Bas van Opheusden, Luigi Acerbi and Wei Ji Ma
- Abstract summary: inverse binomial sampling (IBS) can estimate the log-likelihood of an entire data set efficiently and without bias.
IBS produces lower error in the estimated parameters and maximum log-likelihood values than alternative sampling methods.
- Score: 9.66840768820136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The fate of scientific hypotheses often relies on the ability of a
computational model to explain the data, quantified in modern statistical
approaches by the likelihood function. The log-likelihood is the key element
for parameter estimation and model evaluation. However, the log-likelihood of
complex models in fields such as computational biology and neuroscience is
often intractable to compute analytically or numerically. In those cases,
researchers can often only estimate the log-likelihood by comparing observed
data with synthetic observations generated by model simulations. Standard
techniques to approximate the likelihood via simulation either use summary
statistics of the data or are at risk of producing severe biases in the
estimate. Here, we explore another method, inverse binomial sampling (IBS),
which can estimate the log-likelihood of an entire data set efficiently and
without bias. For each observation, IBS draws samples from the simulator model
until one matches the observation. The log-likelihood estimate is then a
function of the number of samples drawn. The variance of this estimator is
uniformly bounded, achieves the minimum variance for an unbiased estimator, and
we can compute calibrated estimates of the variance. We provide theoretical
arguments in favor of IBS and an empirical assessment of the method for
maximum-likelihood estimation with simulation-based models. As case studies, we
take three model-fitting problems of increasing complexity from computational
and cognitive neuroscience. In all problems, IBS generally produces lower error
in the estimated parameters and maximum log-likelihood values than alternative
sampling methods with the same average number of samples. Our results
demonstrate the potential of IBS as a practical, robust, and easy to implement
method for log-likelihood evaluation when exact techniques are not available.
Related papers
- Robust Estimation for Kernel Exponential Families with Smoothed Total Variation Distances [2.317910166616341]
In statistical inference, we commonly assume that samples are independent and identically distributed from a probability distribution.
In this paper, we explore the application of GAN-like estimators to a general class of statistical models.
arXiv Detail & Related papers (2024-10-28T05:50:47Z) - Estimating Causal Effects from Learned Causal Networks [56.14597641617531]
We propose an alternative paradigm for answering causal-effect queries over discrete observable variables.
We learn the causal Bayesian network and its confounding latent variables directly from the observational data.
We show that this emphmodel completion learning approach can be more effective than estimand approaches.
arXiv Detail & Related papers (2024-08-26T08:39:09Z) - A Provably Accurate Randomized Sampling Algorithm for Logistic Regression [2.7930955543692817]
We present a simple, randomized sampling-based algorithm for logistic regression problem.
We prove that accurate approximations can be achieved with a sample whose size is much smaller than the total number of observations.
Overall, our work sheds light on the potential of using randomized sampling approaches to efficiently approximate the estimated probabilities in logistic regression.
arXiv Detail & Related papers (2024-02-26T06:20:28Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Statistical Efficiency of Score Matching: The View from Isoperimetry [96.65637602827942]
We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated.
We formalize these results both in the sample regime and in the finite regime.
arXiv Detail & Related papers (2022-10-03T06:09:01Z) - Nonparametric likelihood-free inference with Jensen-Shannon divergence
for simulator-based models with categorical output [1.4298334143083322]
Likelihood-free inference for simulator-based statistical models has attracted a surge of interest, both in the machine learning and statistics communities.
Here we derive a set of theoretical results to enable estimation, hypothesis testing and construction of confidence intervals for model parameters using computation properties of the Jensen-Shannon- divergence.
Such approximation offers a rapid alternative to more-intensive approaches and can be attractive for diverse applications of simulator-based models.
arXiv Detail & Related papers (2022-05-22T18:00:13Z) - Nonuniform Negative Sampling and Log Odds Correction with Rare Events
Data [15.696653979226113]
We investigate the issue of parameter estimation with nonuniform negative sampling for imbalanced data.
We derive a general inverse probability weighted (IPW) estimator and obtain the optimal sampling probability that minimizes its variance.
Both theoretical and empirical results demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2021-10-25T15:37:22Z) - Sampling from Arbitrary Functions via PSD Models [55.41644538483948]
We take a two-step approach by first modeling the probability distribution and then sampling from that model.
We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models.
arXiv Detail & Related papers (2021-10-20T12:25:22Z) - On a Variational Approximation based Empirical Likelihood ABC Method [1.5293427903448025]
We propose an easy-to-use empirical likelihood ABC method in this article.
We show that the target log-posterior can be approximated as a sum of an expected joint log-likelihood and the differential entropy of the data generating density.
arXiv Detail & Related papers (2020-11-12T21:24:26Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.