Model Comparison and Calibration Assessment: User Guide for Consistent
Scoring Functions in Machine Learning and Actuarial Practice
- URL: http://arxiv.org/abs/2202.12780v3
- Date: Wed, 26 Jul 2023 14:55:02 GMT
- Title: Model Comparison and Calibration Assessment: User Guide for Consistent
Scoring Functions in Machine Learning and Actuarial Practice
- Authors: Tobias Fissler, Christian Lorentzen, Michael Mayer
- Abstract summary: This user guide revisits and clarifies statistical techniques to assess the calibration or adequacy of a model.
It focuses mainly on the pedagogical presentation of existing results and of best practice.
Results are accompanied and illustrated by two real data case studies on workers' compensation and customer churn.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One of the main tasks of actuaries and data scientists is to build good
predictive models for certain phenomena such as the claim size or the number of
claims in insurance. These models ideally exploit given feature information to
enhance the accuracy of prediction. This user guide revisits and clarifies
statistical techniques to assess the calibration or adequacy of a model on the
one hand, and to compare and rank different models on the other hand. In doing
so, it emphasises the importance of specifying the prediction target functional
at hand a priori (e.g. the mean or a quantile) and of choosing the scoring
function in model comparison in line with this target functional. Guidance for
the practical choice of the scoring function is provided. Striving to bridge
the gap between science and daily practice in application, it focuses mainly on
the pedagogical presentation of existing results and of best practice. The
results are accompanied and illustrated by two real data case studies on
workers' compensation and customer churn.
Related papers
- A performance characteristic curve for model evaluation: the application
in information diffusion prediction [3.8711489380602804]
We propose a metric based on information entropy to quantify the randomness in diffusion data, then identify a scaling pattern between the randomness and the prediction accuracy of the model.
Data points in the patterns by different sequence lengths, system sizes, and randomness all collapse into a single curve, capturing a model's inherent capability of making correct predictions.
The validity of the curve is tested by three prediction models in the same family, reaching conclusions in line with existing studies.
arXiv Detail & Related papers (2023-09-18T07:32:57Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Post-hoc Models for Performance Estimation of Machine Learning Inference [22.977047604404884]
Estimating how well a machine learning model performs during inference is critical in a variety of scenarios.
We systematically generalize performance estimation to a diverse set of metrics and scenarios.
We find that proposed post-hoc models consistently outperform the standard confidence baselines.
arXiv Detail & Related papers (2021-10-06T02:20:37Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Characterizing Fairness Over the Set of Good Models Under Selective
Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance.
We provide tractable algorithms to compute the range of attainable group-level predictive disparities.
We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z) - Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual
Model-Based Reinforcement Learning [109.74041512359476]
We study a number of design decisions for the predictive model in visual MBRL algorithms.
We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance.
We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks will perform the same as the best-performing models when trained on the same training data.
arXiv Detail & Related papers (2020-12-08T18:03:21Z) - Double Robust Representation Learning for Counterfactual Prediction [68.78210173955001]
We propose a novel scalable method to learn double-robust representations for counterfactual predictions.
We make robust and efficient counterfactual predictions for both individual and average treatment effects.
The algorithm shows competitive performance with the state-of-the-art on real world and synthetic data.
arXiv Detail & Related papers (2020-10-15T16:39:26Z) - Metrics for Benchmarking and Uncertainty Quantification: Quality,
Applicability, and a Path to Best Practices for Machine Learning in Chemistry [0.0]
This review aims to draw attention to two issues of concern when we set out to make machine learning benchmarking work in the chemical and materials domain.
They are often overlooked or underappreciated topics as chemists typically only have limited training in statistics.
These metrics are also key to comparing the performance of different models and thus for developing guidelines and best practices for the successful application of machine learning in chemistry.
arXiv Detail & Related papers (2020-09-30T21:19:17Z) - ALEX: Active Learning based Enhancement of a Model's Explainability [34.26945469627691]
An active learning (AL) algorithm seeks to construct an effective classifier with a minimal number of labeled examples in a bootstrapping manner.
In the era of data-driven learning, this is an important research direction to pursue.
This paper describes our work-in-progress towards developing an AL selection function that in addition to model effectiveness also seeks to improve on the interpretability of a model during the bootstrapping steps.
arXiv Detail & Related papers (2020-09-02T07:15:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.