Rethinking Evaluation Metric for Probability Estimation Models Using
Esports Data
- URL: http://arxiv.org/abs/2309.06248v1
- Date: Tue, 12 Sep 2023 14:04:12 GMT
- Title: Rethinking Evaluation Metric for Probability Estimation Models Using
Esports Data
- Authors: Euihyeon Choi, Jooyoung Kim, Wonkyung Lee
- Abstract summary: We propose a novel metric called Balance score which is a simple yet effective metric in terms of six good properties that probability estimation metric should have.
Under the general condition, we also found that the Balance score can be an effective approximation of the true expected calibration error.
- Score: 8.10304644344495
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Probability estimation models play an important role in various fields, such
as weather forecasting, recommendation systems, and sports analysis. Among
several models estimating probabilities, it is difficult to evaluate which
model gives reliable probabilities since the ground-truth probabilities are not
available. The win probability estimation model for esports, which calculates
the win probability under a certain game state, is also one of the fields being
actively studied in probability estimation. However, most of the previous works
evaluated their models using accuracy, a metric that only can measure the
performance of discrimination. In this work, we firstly investigate the Brier
score and the Expected Calibration Error (ECE) as a replacement of accuracy
used as a performance evaluation metric for win probability estimation models
in esports field. Based on the analysis, we propose a novel metric called
Balance score which is a simple yet effective metric in terms of six good
properties that probability estimation metric should have. Under the general
condition, we also found that the Balance score can be an effective
approximation of the true expected calibration error which has been imperfectly
approximated by ECE using the binning technique. Extensive evaluations using
simulation studies and real game snapshot data demonstrate the promising
potential to adopt the proposed metric not only for the win probability
estimation model for esports but also for evaluating general probability
estimation models.
Related papers
- A Probabilistic Perspective on Unlearning and Alignment for Large Language Models [48.96686419141881]
We introduce the first formal probabilistic evaluation framework in Large Language Models (LLMs)
We derive novel metrics with high-probability guarantees concerning the output distribution of a model.
Our metrics are application-independent and allow practitioners to make more reliable estimates about model capabilities before deployment.
arXiv Detail & Related papers (2024-10-04T15:44:23Z) - Deep Probability Segmentation: Are segmentation models probability estimators? [0.7646713951724011]
We apply Calibrated Probability Estimation to segmentation tasks to evaluate its impact on model calibration.
Results indicate that while CaPE improves calibration, its effect is less pronounced compared to classification tasks.
We also investigated the influence of dataset size and bin optimization on the effectiveness of calibration.
arXiv Detail & Related papers (2024-09-19T07:52:19Z) - Probabilistic Scores of Classifiers, Calibration is not Enough [0.32985979395737786]
In binary classification tasks, accurate representation of probabilistic predictions is essential for various real-world applications.
In this study, we highlight approaches that prioritize the alignment between predicted scores and true probability distributions.
Our findings reveal limitations in traditional calibration metrics, which could undermine the reliability of predictive models for critical decision-making.
arXiv Detail & Related papers (2024-08-06T19:53:00Z) - Confidence-based Estimators for Predictive Performance in Model Monitoring [0.5399800035598186]
After a machine learning model has been deployed into production, its predictive performance needs to be monitored.
Recently, novel methods for estimating the predictive performance of a model when ground truth is unavailable have been developed.
We show that under certain general assumptions, the Average Confidence (AC) method is an unbiased and consistent estimator of model accuracy.
arXiv Detail & Related papers (2024-07-11T16:28:31Z) - Usable Region Estimate for Assessing Practical Usability of Medical
Image Segmentation Models [32.56957759180135]
We aim to quantitatively measure the practical usability of medical image segmentation models.
We first propose a measure, Correctness-Confidence Rank Correlation (CCRC), to capture how predictions' confidence estimates correlate with their correctness scores in rank.
We then propose Usable Region Estimate (URE), which simultaneously quantifies predictions' correctness and reliability of confidence assessments in one estimate.
arXiv Detail & Related papers (2022-07-01T02:33:44Z) - Uncertainty estimation of pedestrian future trajectory using Bayesian
approximation [137.00426219455116]
Under dynamic traffic scenarios, planning based on deterministic predictions is not trustworthy.
The authors propose to quantify uncertainty during forecasting using approximation which deterministic approaches fail to capture.
The effect of dropout weights and long-term prediction on future state uncertainty has been studied.
arXiv Detail & Related papers (2022-05-04T04:23:38Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic
Regression [51.770998056563094]
Probabilistic Gradient Boosting Machines (PGBM) is a method to create probabilistic predictions with a single ensemble of decision trees.
We empirically demonstrate the advantages of PGBM compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-03T08:32:13Z) - Density of States Estimation for Out-of-Distribution Detection [69.90130863160384]
DoSE is the density of states estimator.
We demonstrate DoSE's state-of-the-art performance against other unsupervised OOD detectors.
arXiv Detail & Related papers (2020-06-16T16:06:25Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.