Related papers: On (assessing) the fairness of risk score models

On (assessing) the fairness of risk score models

URL: http://arxiv.org/abs/2302.08851v1
Date: Fri, 17 Feb 2023 12:45:51 GMT
Title: On (assessing) the fairness of risk score models
Authors: Eike Petersen, Melanie Ganz, Sune Hannibal Holm, Aasa Feragen
Abstract summary: Risk models are of interest for a number of reasons, including the fact that they communicate uncertainty about the potential outcomes to users. We identify the provision of similar value to different groups as a key desideratum for risk score fairness. We introduce a novel calibration error metric that is less sample size-biased than previously proposed metrics.
Score: 2.0646127669654826
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Recent work on algorithmic fairness has largely focused on the fairness of discrete decisions, or classifications. While such decisions are often based on risk score models, the fairness of the risk models themselves has received considerably less attention. Risk models are of interest for a number of reasons, including the fact that they communicate uncertainty about the potential outcomes to users, thus representing a way to enable meaningful human oversight. Here, we address fairness desiderata for risk score models. We identify the provision of similar epistemic value to different groups as a key desideratum for risk score fairness. Further, we address how to assess the fairness of risk score models quantitatively, including a discussion of metric choices and meaningful statistical comparisons between groups. In this context, we also introduce a novel calibration error metric that is less sample size-biased than previously proposed metrics, enabling meaningful comparisons between groups of different sizes. We illustrate our methodology - which is widely applicable in many other settings - in two case studies, one in recidivism risk prediction, and one in risk of major depressive disorder (MDD) prediction.

Related papers

Data-driven decision-making under uncertainty with entropic risk measure [5.407319151576265]
The entropic risk measure is widely used in high-stakes decision making to account for tail risks associated with an uncertain loss. To debias the empirical entropic risk estimator, we propose a strongly consistent bootstrapping procedure. We show that cross validation methods can result in significantly higher out-of-sample risk for the insurer if the bias in validation performance is not corrected for.
arXiv Detail & Related papers (2024-09-30T04:02:52Z)
Data-Adaptive Tradeoffs among Multiple Risks in Distribution-Free Prediction [55.77015419028725]
We develop methods that permit valid control of risk when threshold and tradeoff parameters are chosen adaptively. Our methodology supports monotone and nearly-monotone risks, but otherwise makes no distributional assumptions.
arXiv Detail & Related papers (2024-03-28T17:28:06Z)
Auditing Fairness under Unobserved Confounding [56.61738581796362]
We show that, surprisingly, one can still compute meaningful bounds on treatment rates for high-risk individuals. We use the fact that in many real-world settings we have data from prior to any allocation to derive unbiased estimates of risk.
arXiv Detail & Related papers (2024-03-18T21:09:06Z)
On the Societal Impact of Open Foundation Models [93.67389739906561]
We focus on open foundation models, defined here as those with broadly available model weights. We identify five distinctive properties of open foundation models that lead to both their benefits and risks.
arXiv Detail & Related papers (2024-02-27T16:49:53Z)
Risk Aware Benchmarking of Large Language Models [36.95053112313244]
We propose a distributional framework for benchmarking socio-technical risks of foundation models with quantified statistical significance. We show that the second order statistics in this test are linked to mean-risk models commonly used in econometrics and mathematical finance. We use our framework to compare various large language models regarding risks related to drifting from instructions and outputting toxic content.
arXiv Detail & Related papers (2023-10-11T02:08:37Z)
In Search of Insights, Not Magic Bullets: Towards Demystification of the Model Selection Dilemma in Heterogeneous Treatment Effect Estimation [92.51773744318119]
This paper empirically investigates the strengths and weaknesses of different model selection criteria. We highlight that there is a complex interplay between selection strategies, candidate estimators and the data used for comparing them.
arXiv Detail & Related papers (2023-02-06T16:55:37Z)
Mitigating multiple descents: A model-agnostic framework for risk monotonization [84.6382406922369]
We develop a general framework for risk monotonization based on cross-validation. We propose two data-driven methodologies, namely zero- and one-step, that are akin to bagging and boosting.
arXiv Detail & Related papers (2022-05-25T17:41:40Z)
Two steps to risk sensitivity [4.974890682815778]
conditional value-at-risk (CVaR) is a risk measure for modeling human and animal planning. We adopt a conventional distributional approach to CVaR in a sequential setting and reanalyze the choices of human decision-makers. We then consider a further critical property of risk sensitivity, namely time consistency, showing alternatives to this form of CVaR.
arXiv Detail & Related papers (2021-11-12T16:27:47Z)
Accounting for Model Uncertainty in Algorithmic Discrimination [16.654676310264705]
We argue that the fairness approaches should instead focus only on equalizing errors arising due to model uncertainty. We draw a connection between predictive multiplicity and model uncertainty and argue that the techniques from predictive multiplicity could be used to identify errors made due to model uncertainty.
arXiv Detail & Related papers (2021-05-10T10:34:12Z)
Characterizing Fairness Over the Set of Good Models Under Selective Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance. We provide tractable algorithms to compute the range of attainable group-level predictive disparities. We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z)
Feedback Effects in Repeat-Use Criminal Risk Assessments [0.0]
We show that risk can propagate over sequential decisions in ways that are not captured by one-shot tests. Risk assessment tools operate in a highly complex and path-dependent process, fraught with historical inequity.
arXiv Detail & Related papers (2020-11-28T06:40:05Z)
Decision-Making with Auto-Encoding Variational Bayes [71.44735417472043]
We show that a posterior approximation distinct from the variational distribution should be used for making decisions. Motivated by these theoretical results, we propose learning several approximate proposals for the best model. In addition to toy examples, we present a full-fledged case study of single-cell RNA sequencing.
arXiv Detail & Related papers (2020-02-17T19:23:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.