Learning to Estimate Without Bias
- URL: http://arxiv.org/abs/2110.12403v3
- Date: Wed, 29 Nov 2023 10:01:05 GMT
- Title: Learning to Estimate Without Bias
- Authors: Tzvi Diskin, Yonina C. Eldar and Ami Wiesel
- Abstract summary: Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
- Score: 57.82628598276623
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Gauss Markov theorem states that the weighted least squares estimator is
a linear minimum variance unbiased estimation (MVUE) in linear models. In this
paper, we take a first step towards extending this result to non linear
settings via deep learning with bias constraints. The classical approach to
designing non-linear MVUEs is through maximum likelihood estimation (MLE) which
often involves computationally challenging optimizations. On the other hand,
deep learning methods allow for non-linear estimators with fixed computational
complexity. Learning based estimators perform optimally on average with respect
to their training set but may suffer from significant bias in other parameters.
To avoid this, we propose to add a simple bias constraint to the loss function,
resulting in an estimator we refer to as Bias Constrained Estimator (BCE). We
prove that this yields asymptotic MVUEs that behave similarly to the classical
MLEs and asymptotically attain the Cramer Rao bound. We demonstrate the
advantages of our approach in the context of signal to noise ratio estimation
as well as covariance estimation. A second motivation to BCE is in applications
where multiple estimates of the same unknown are averaged for improved
performance. Examples include distributed sensor networks and data augmentation
in test-time. In such applications, we show that BCE leads to asymptotically
consistent estimators.
Related papers
- Fine-Grained Dynamic Framework for Bias-Variance Joint Optimization on Data Missing Not at Random [2.8165314121189247]
In most practical applications such as recommendation systems, display advertising, and so forth, the collected data often contains missing values.
We develop a systematic fine-grained dynamic learning framework to jointly optimize bias and variance.
arXiv Detail & Related papers (2024-05-24T10:07:09Z) - Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - Error Reduction from Stacked Regressions [12.657895453939298]
Stacking regressions is an ensemble technique that forms linear combinations of different regression estimators to enhance predictive accuracy.
In this paper, we learn these weights analogously by minimizing a regularized version of the empirical risk subject to a nonnegativity constraint.
Thanks to an adaptive shrinkage effect, the resulting stacked estimator has strictly smaller population risk than best single estimator among them.
arXiv Detail & Related papers (2023-09-18T15:42:12Z) - Scalable method for Bayesian experimental design without integrating
over posterior distribution [0.0]
We address the computational efficiency in solving the A-optimal Bayesian design of experiments problems.
A-optimality is a widely used and easy-to-interpret criterion for Bayesian experimental design.
This study presents a novel likelihood-free approach to the A-optimal experimental design.
arXiv Detail & Related papers (2023-06-30T12:40:43Z) - Variational Linearized Laplace Approximation for Bayesian Deep Learning [11.22428369342346]
We propose a new method for approximating Linearized Laplace Approximation (LLA) using a variational sparse Gaussian Process (GP)
Our method is based on the dual RKHS formulation of GPs and retains, as the predictive mean, the output of the original DNN.
It allows for efficient optimization, which results in sub-linear training time in the size of the training dataset.
arXiv Detail & Related papers (2023-02-24T10:32:30Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient
Estimator [93.05919133288161]
We show that the variance of the straight-through variant of the popular Gumbel-Softmax estimator can be reduced through Rao-Blackwellization.
This provably reduces the mean squared error.
We empirically demonstrate that this leads to variance reduction, faster convergence, and generally improved performance in two unsupervised latent variable models.
arXiv Detail & Related papers (2020-10-09T22:54:38Z) - Learning Minimax Estimators via Online Learning [55.92459567732491]
We consider the problem of designing minimax estimators for estimating parameters of a probability distribution.
We construct an algorithm for finding a mixed-case Nash equilibrium.
arXiv Detail & Related papers (2020-06-19T22:49:42Z) - On Low-rank Trace Regression under General Sampling Distribution [9.699586426043885]
We show that cross-validated estimators satisfy near-optimal error bounds on general assumptions.
We also show that the cross-validated estimator outperforms the theory-inspired approach of selecting the parameter.
arXiv Detail & Related papers (2019-04-18T02:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.