In-Context Parametric Inference: Point or Distribution Estimators?
- URL: http://arxiv.org/abs/2502.11617v1
- Date: Mon, 17 Feb 2025 10:00:24 GMT
- Title: In-Context Parametric Inference: Point or Distribution Estimators?
- Authors: Sarthak Mittal, Yoshua Bengio, Nikolay Malkin, Guillaume Lajoie,
- Abstract summary: We show that amortized point estimators generally outperform posterior inference, though the latter remain competitive in some low-dimensional problems.
Our experiments indicate that amortized point estimators generally outperform posterior inference, though the latter remain competitive in some low-dimensional problems.
- Score: 66.22308335324239
- License:
- Abstract: Bayesian and frequentist inference are two fundamental paradigms in statistical estimation. Bayesian methods treat hypotheses as random variables, incorporating priors and updating beliefs via Bayes' theorem, whereas frequentist methods assume fixed but unknown hypotheses, relying on estimators like maximum likelihood. While extensive research has compared these approaches, the frequentist paradigm of obtaining point estimates has become predominant in deep learning, as Bayesian inference is challenging due to the computational complexity and the approximation gap of posterior estimation methods. However, a good understanding of trade-offs between the two approaches is lacking in the regime of amortized estimators, where in-context learners are trained to estimate either point values via maximum likelihood or maximum a posteriori estimation, or full posteriors using normalizing flows, score-based diffusion samplers, or diagonal Gaussian approximations, conditioned on observations. To help resolve this, we conduct a rigorous comparative analysis spanning diverse problem settings, from linear models to shallow neural networks, with a robust evaluation framework assessing both in-distribution and out-of-distribution generalization on tractable tasks. Our experiments indicate that amortized point estimators generally outperform posterior inference, though the latter remain competitive in some low-dimensional problems, and we further discuss why this might be the case.
Related papers
- A Mean Field Approach to Empirical Bayes Estimation in High-dimensional
Linear Regression [8.345523969593492]
We study empirical Bayes estimation in high-dimensional linear regression.
We adopt a variational empirical Bayes approach, introduced originally in Carbonetto and Stephens (2012) and Kim et al. (2022).
This provides the first rigorous empirical Bayes method in a high-dimensional regression setting without sparsity.
arXiv Detail & Related papers (2023-09-28T20:51:40Z) - Adversarial robustness of amortized Bayesian inference [3.308743964406687]
Amortized Bayesian inference is to initially invest computational cost in training an inference network on simulated data.
We show that almost unrecognizable, targeted perturbations of the observations can lead to drastic changes in the predicted posterior and highly unrealistic posterior predictive samples.
We propose a computationally efficient regularization scheme based on penalizing the Fisher information of the conditional density estimator.
arXiv Detail & Related papers (2023-05-24T10:18:45Z) - The Implicit Delta Method [61.36121543728134]
In this paper, we propose an alternative, the implicit delta method, which works by infinitesimally regularizing the training loss of uncertainty.
We show that the change in the evaluation due to regularization is consistent for the variance of the evaluation estimator, even when the infinitesimal change is approximated by a finite difference.
arXiv Detail & Related papers (2022-11-11T19:34:17Z) - Density Estimation with Autoregressive Bayesian Predictives [1.5771347525430772]
In the context of density estimation, the standard Bayesian approach is to target the posterior predictive.
We develop a novel parameterization of the bandwidth using an autoregressive neural network that maps the data into a latent space.
arXiv Detail & Related papers (2022-06-13T20:43:39Z) - Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.
We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z) - Residual Overfit Method of Exploration [78.07532520582313]
We propose an approximate exploration methodology based on fitting only two point estimates, one tuned and one overfit.
The approach drives exploration towards actions where the overfit model exhibits the most overfitting compared to the tuned model.
We compare ROME against a set of established contextual bandit methods on three datasets and find it to be one of the best performing.
arXiv Detail & Related papers (2021-10-06T17:05:33Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Quantifying Uncertainty in Deep Spatiotemporal Forecasting [67.77102283276409]
We describe two types of forecasting problems: regular grid-based and graph-based.
We analyze UQ methods from both the Bayesian and the frequentist point view, casting in a unified framework via statistical decision theory.
Through extensive experiments on real-world road network traffic, epidemics, and air quality forecasting tasks, we reveal the statistical computational trade-offs for different UQ methods.
arXiv Detail & Related papers (2021-05-25T14:35:46Z) - Simulation comparisons between Bayesian and de-biased estimators in
low-rank matrix completion [0.0]
We study the low-rank matrix completion problem, a class of machine learning problems, that aims at the prediction of missing entries in a partially observed matrix.
We compare the Bayesian approaches and a recently introduced de-biased estimator which provides a useful way to build confidence intervals of interest.
We find that the empirical coverage rate of the confidence intervals obtained by the de-biased estimator for an entry is absolutely lower than of the considered credible interval.
arXiv Detail & Related papers (2021-03-22T12:02:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.