Leveraging Expert Consistency to Improve Algorithmic Decision Support
- URL: http://arxiv.org/abs/2101.09648v3
- Date: Mon, 3 Jun 2024 15:23:05 GMT
- Title: Leveraging Expert Consistency to Improve Algorithmic Decision Support
- Authors: Maria De-Arteaga, Vincent Jeanselme, Artur Dubrawski, Alexandra Chouldechova,
- Abstract summary: We explore the use of historical expert decisions as a rich source of information that can be combined with observed outcomes to narrow the construct gap.
We propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert.
Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap.
- Score: 62.61153549123407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning (ML) is increasingly being used to support high-stakes decisions. However, there is frequently a construct gap: a gap between the construct of interest to the decision-making task and what is captured in proxies used as labels to train ML models. As a result, ML models may fail to capture important dimensions of decision criteria, hampering their utility for decision support. Thus, an essential step in the design of ML systems for decision support is selecting a target label among available proxies. In this work, we explore the use of historical expert decisions as a rich -- yet also imperfect -- source of information that can be combined with observed outcomes to narrow the construct gap. We argue that managers and system designers may be interested in learning from experts in instances where they exhibit consistency with each other, while learning from observed outcomes otherwise. We develop a methodology to enable this goal using information that is commonly available in organizational information systems. This involves two core steps. First, we propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert. Second, we introduce a label amalgamation approach that allows ML models to simultaneously learn from expert decisions and observed outcomes. Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap, yielding better predictive performance than learning from either observed outcomes or expert decisions alone.
Related papers
- Cognitive LLMs: Towards Integrating Cognitive Architectures and Large Language Models for Manufacturing Decision-making [51.737762570776006]
LLM-ACTR is a novel neuro-symbolic architecture that provides human-aligned and versatile decision-making.
Our framework extracts and embeds knowledge of ACT-R's internal decision-making process as latent neural representations.
Our experiments on novel Design for Manufacturing tasks show both improved task performance as well as improved grounded decision-making capability.
arXiv Detail & Related papers (2024-08-17T11:49:53Z) - MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.
We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.
Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z) - Improving Open Information Extraction with Large Language Models: A
Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text.
Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z) - One Model Many Scores: Using Multiverse Analysis to Prevent Fairness Hacking and Evaluate the Influence of Model Design Decisions [4.362723406385396]
We show how multiverse analysis can be used to better understand fairness implications of design and evaluation decisions.
Our results highlight how decisions regarding the evaluation of a system can lead to vastly different fairness metrics for the same model.
arXiv Detail & Related papers (2023-08-31T12:32:43Z) - Rational Decision-Making Agent with Internalized Utility Judgment [91.80700126895927]
Large language models (LLMs) have demonstrated remarkable advancements and have attracted significant efforts to develop LLMs into agents capable of executing intricate multi-step decision-making tasks beyond traditional NLP applications.
This paper proposes RadAgent, which fosters the development of its rationality through an iterative framework involving Experience Exploration and Utility Learning.
Experimental results on the ToolBench dataset demonstrate RadAgent's superiority over baselines, achieving over 10% improvement in Pass Rate on diverse tasks.
arXiv Detail & Related papers (2023-08-24T03:11:45Z) - Topological Interpretability for Deep-Learning [0.30806551485143496]
Deep learning (DL) models cannot quantify the certainty of their predictions.
This work presents a method to infer prominent features in two DL classification models trained on clinical and non-clinical text.
arXiv Detail & Related papers (2023-05-15T13:38:13Z) - Exploiting Meta-Cognitive Features for a Machine-Learning-Based One-Shot
Group-Decision Aggregation [0.7340017786387767]
Methods that rely on meta-cognitive information, such as confidence-based methods, had shown an improvement in various tasks.
Our aim is to exploit meta-cognitive information and to learn from it, for the purpose of enhancing the ability of the group to produce a correct answer.
arXiv Detail & Related papers (2022-01-20T15:56:18Z) - A Machine Learning Framework Towards Transparency in Experts' Decision
Quality [0.0]
In many important settings, transparency in experts' decision quality is rarely possible because ground truth data for evaluating the experts' decisions is costly and available only for a limited set of decisions.
We first formulate the problem of estimating experts' decision accuracy in this setting and then develop a machine-learning-based framework to address it.
Our method effectively leverages both abundant historical data on workers' past decisions, and scarce decision instances with ground truth information.
arXiv Detail & Related papers (2021-10-21T18:50:40Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - How fair can we go in machine learning? Assessing the boundaries of
fairness in decision trees [0.12891210250935145]
We present the first methodology that allows to explore the statistical limits of bias mitigation interventions.
We focus our study on decision tree classifiers since they are widely accepted in machine learning.
We conclude experimentally that our method can optimize decision tree models by being fairer with a small cost of the classification error.
arXiv Detail & Related papers (2020-06-22T16:28:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.