Leveraging Expert Consistency to Improve Algorithmic Decision Support
- URL: http://arxiv.org/abs/2101.09648v3
- Date: Mon, 3 Jun 2024 15:23:05 GMT
- Title: Leveraging Expert Consistency to Improve Algorithmic Decision Support
- Authors: Maria De-Arteaga, Vincent Jeanselme, Artur Dubrawski, Alexandra Chouldechova,
- Abstract summary: We explore the use of historical expert decisions as a rich source of information that can be combined with observed outcomes to narrow the construct gap.
We propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert.
Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap.
- Score: 62.61153549123407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning (ML) is increasingly being used to support high-stakes decisions. However, there is frequently a construct gap: a gap between the construct of interest to the decision-making task and what is captured in proxies used as labels to train ML models. As a result, ML models may fail to capture important dimensions of decision criteria, hampering their utility for decision support. Thus, an essential step in the design of ML systems for decision support is selecting a target label among available proxies. In this work, we explore the use of historical expert decisions as a rich -- yet also imperfect -- source of information that can be combined with observed outcomes to narrow the construct gap. We argue that managers and system designers may be interested in learning from experts in instances where they exhibit consistency with each other, while learning from observed outcomes otherwise. We develop a methodology to enable this goal using information that is commonly available in organizational information systems. This involves two core steps. First, we propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert. Second, we introduce a label amalgamation approach that allows ML models to simultaneously learn from expert decisions and observed outcomes. Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap, yielding better predictive performance than learning from either observed outcomes or expert decisions alone.
Related papers
- Improving Open Information Extraction with Large Language Models: A
Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text.
Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z) - One Model Many Scores: Using Multiverse Analysis to Prevent Fairness Hacking and Evaluate the Influence of Model Design Decisions [4.362723406385396]
We show how multiverse analysis can be used to better understand fairness implications of design and evaluation decisions.
Our results highlight how decisions regarding the evaluation of a system can lead to vastly different fairness metrics for the same model.
arXiv Detail & Related papers (2023-08-31T12:32:43Z) - Rational Decision-Making Agent with Internalized Utility Judgment [91.80700126895927]
Large language models (LLMs) have demonstrated remarkable advancements and have attracted significant efforts to develop LLMs into agents capable of executing intricate multi-step decision-making tasks beyond traditional NLP applications.
This paper proposes RadAgent, which fosters the development of its rationality through an iterative framework involving Experience Exploration and Utility Learning.
Experimental results on the ToolBench dataset demonstrate RadAgent's superiority over baselines, achieving over 10% improvement in Pass Rate on diverse tasks.
arXiv Detail & Related papers (2023-08-24T03:11:45Z) - Topological Interpretability for Deep-Learning [0.30806551485143496]
Deep learning (DL) models cannot quantify the certainty of their predictions.
This work presents a method to infer prominent features in two DL classification models trained on clinical and non-clinical text.
arXiv Detail & Related papers (2023-05-15T13:38:13Z) - In Search of Insights, Not Magic Bullets: Towards Demystification of the
Model Selection Dilemma in Heterogeneous Treatment Effect Estimation [92.51773744318119]
This paper empirically investigates the strengths and weaknesses of different model selection criteria.
We highlight that there is a complex interplay between selection strategies, candidate estimators and the data used for comparing them.
arXiv Detail & Related papers (2023-02-06T16:55:37Z) - Exploiting Meta-Cognitive Features for a Machine-Learning-Based One-Shot
Group-Decision Aggregation [0.7340017786387767]
Methods that rely on meta-cognitive information, such as confidence-based methods, had shown an improvement in various tasks.
Our aim is to exploit meta-cognitive information and to learn from it, for the purpose of enhancing the ability of the group to produce a correct answer.
arXiv Detail & Related papers (2022-01-20T15:56:18Z) - A Machine Learning Framework Towards Transparency in Experts' Decision
Quality [0.0]
In many important settings, transparency in experts' decision quality is rarely possible because ground truth data for evaluating the experts' decisions is costly and available only for a limited set of decisions.
We first formulate the problem of estimating experts' decision accuracy in this setting and then develop a machine-learning-based framework to address it.
Our method effectively leverages both abundant historical data on workers' past decisions, and scarce decision instances with ground truth information.
arXiv Detail & Related papers (2021-10-21T18:50:40Z) - Decision Rule Elicitation for Domain Adaptation [93.02675868486932]
Human-in-the-loop machine learning is widely used in artificial intelligence (AI) to elicit labels from experts.
In this work, we allow experts to additionally produce decision rules describing their decision-making.
We show that decision rule elicitation improves domain adaptation of the algorithm and helps to propagate expert's knowledge to the AI model.
arXiv Detail & Related papers (2021-02-23T08:07:22Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - How fair can we go in machine learning? Assessing the boundaries of
fairness in decision trees [0.12891210250935145]
We present the first methodology that allows to explore the statistical limits of bias mitigation interventions.
We focus our study on decision tree classifiers since they are widely accepted in machine learning.
We conclude experimentally that our method can optimize decision tree models by being fairer with a small cost of the classification error.
arXiv Detail & Related papers (2020-06-22T16:28:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.