Expected Validation Performance and Estimation of a Random Variable's
Maximum
- URL: http://arxiv.org/abs/2110.00613v1
- Date: Fri, 1 Oct 2021 18:48:47 GMT
- Title: Expected Validation Performance and Estimation of a Random Variable's
Maximum
- Authors: Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, Noah A.
Smith
- Abstract summary: We analyze three statistical estimators for expected validation performance.
We find the unbiased estimator has the highest variance, and the estimator with the smallest variance has the largest bias.
We find that the two biased estimators lead to the fewest incorrect conclusions.
- Score: 48.83713377993604
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Research in NLP is often supported by experimental results, and improved
reporting of such results can lead to better understanding and more
reproducible science. In this paper we analyze three statistical estimators for
expected validation performance, a tool used for reporting performance (e.g.,
accuracy) as a function of computational budget (e.g., number of hyperparameter
tuning experiments). Where previous work analyzing such estimators focused on
the bias, we also examine the variance and mean squared error (MSE). In both
synthetic and realistic scenarios, we evaluate three estimators and find the
unbiased estimator has the highest variance, and the estimator with the
smallest variance has the largest bias; the estimator with the smallest MSE
strikes a balance between bias and variance, displaying a classic bias-variance
tradeoff. We use expected validation performance to compare between different
models, and analyze how frequently each estimator leads to drawing incorrect
conclusions about which of two models performs best. We find that the two
biased estimators lead to the fewest incorrect conclusions, which hints at the
importance of minimizing variance and MSE.
Related papers
- Precise Model Benchmarking with Only a Few Observations [6.092112060364272]
We propose an empirical Bayes (EB) estimator that balances direct and regression estimates for each subgroup separately.
EB consistently provides more precise estimates of the LLM performance compared to the direct and regression approaches.
arXiv Detail & Related papers (2024-10-07T17:26:31Z) - High Precision Causal Model Evaluation with Conditional Randomization [10.23470075454725]
We introduce a novel low-variance estimator for causal error, dubbed as the pairs estimator.
By applying the same IPW estimator to both the model and true experimental effects, our estimator effectively cancels out the variance due to IPW and achieves a smaller variance.
Our method offers a simple yet powerful solution to evaluate causal inference models in conditional randomization settings without complicated modification of the IPW estimator itself.
arXiv Detail & Related papers (2023-11-03T13:22:27Z) - De-biasing "bias" measurement [20.049916973204102]
We show that metrics used to measure group-wise model performance disparities are themselves statistically biased estimators of the underlying quantities they purport to represent.
We propose the "double-corrected" variance estimator, which provides unbiased estimates and uncertainty quantification of the variance of model performance across groups.
arXiv Detail & Related papers (2022-05-11T20:51:57Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Predicting with Confidence on Unseen Distributions [90.68414180153897]
We connect domain adaptation and predictive uncertainty literature to predict model accuracy on challenging unseen distributions.
We find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts.
We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference.
arXiv Detail & Related papers (2021-07-07T15:50:18Z) - Machine Learning for Variance Reduction in Online Experiments [1.9181913148426697]
We propose a machine learning regression-adjusted treatment effect estimator, which we call MLRATE.
MLRATE uses machine learning predictors of the outcome to reduce estimator variance.
In A/A tests, for a set of 48 outcome metrics commonly monitored in Facebook experiments, the estimator has over 70% lower variance than the simple difference-in-means estimator.
arXiv Detail & Related papers (2021-06-14T09:35:54Z) - Towards More Fine-grained and Reliable NLP Performance Prediction [85.78131503006193]
We make two contributions to improving performance prediction for NLP tasks.
First, we examine performance predictors for holistic measures of accuracy like F1 or BLEU.
Second, we propose methods to understand the reliability of a performance prediction model from two angles: confidence intervals and calibration.
arXiv Detail & Related papers (2021-02-10T15:23:20Z) - CoinPress: Practical Private Mean and Covariance Estimation [18.6419638570742]
We present simple differentially private estimators for the mean and covariance of multivariate sub-Gaussian data.
We show that their error rates match the state-of-the-art theoretical bounds, and that they concretely outperform all previous methods.
arXiv Detail & Related papers (2020-06-11T17:17:28Z) - Showing Your Work Doesn't Always Work [73.63200097493576]
"Show Your Work: Improved Reporting of Experimental Results" advocates for reporting the expected validation effectiveness of the best-tuned model.
We analytically show that their estimator is biased and uses error-prone assumptions.
We derive an unbiased alternative and bolster our claims with empirical evidence from statistical simulation.
arXiv Detail & Related papers (2020-04-28T17:59:01Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.