Generative Models, Humans, Predictive Models: Who Is Worse at High-Stakes Decision Making?
- URL: http://arxiv.org/abs/2410.15471v2
- Date: Fri, 14 Feb 2025 05:41:23 GMT
- Title: Generative Models, Humans, Predictive Models: Who Is Worse at High-Stakes Decision Making?
- Authors: Keri Mallari, Julius Adebayo, Kori Inkpen, Martin T. Wells, Albert Gordo, Sarah Tan,
- Abstract summary: Large generative models (LMs) are already being used for decision making tasks that were previously done by predictive models or humans.
We put popular LMs to the test in a high-stakes decision making task: recidivism prediction.
- Score: 10.225573060836478
- License:
- Abstract: Despite strong advisory against it, large generative models (LMs) are already being used for decision making tasks that were previously done by predictive models or humans. We put popular LMs to the test in a high-stakes decision making task: recidivism prediction. Studying three closed-access and open-source LMs, we analyze the LMs not exclusively in terms of accuracy, but also in terms of agreement with (imperfect, noisy, and sometimes biased) human predictions or existing predictive models. We conduct experiments that assess how providing different types of information, including distractor information such as photos, can influence LM decisions. We also stress test techniques designed to either increase accuracy or mitigate bias in LMs, and find that some to have unintended consequences on LM decisions. Our results provide additional quantitative evidence to the wisdom that current LMs are not the right tools for these types of tasks.
Related papers
- Predicting Emergent Capabilities by Finetuning [98.9684114851891]
We find that finetuning language models can shift the point in scaling at which emergence occurs towards less capable models.
We validate this approach using four standard NLP benchmarks.
We find that, in some cases, we can accurately predict whether models trained with up to 4x more compute have emerged.
arXiv Detail & Related papers (2024-11-25T01:48:09Z) - Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models [79.76293901420146]
Large Language Models (LLMs) are employed across various high-stakes domains, where the reliability of their outputs is crucial.
Our research investigates the fragility of uncertainty estimation and explores potential attacks.
We demonstrate that an attacker can embed a backdoor in LLMs, which, when activated by a specific trigger in the input, manipulates the model's uncertainty without affecting the final output.
arXiv Detail & Related papers (2024-07-15T23:41:11Z) - Self-Recognition in Language Models [10.649471089216489]
We propose a novel approach for assessing self-recognition in LMs using model-generated "security questions"
We use our test to examine self-recognition in ten of the most capable open- and closed-source LMs currently publicly available.
Our results suggest that given a set of alternatives, LMs seek to pick the "best" answer, regardless of its origin.
arXiv Detail & Related papers (2024-07-09T15:23:28Z) - Large Language Models Must Be Taught to Know What They Don't Know [97.90008709512921]
We show that fine-tuning on a small dataset of correct and incorrect answers can create an uncertainty estimate with good generalization and small computational overhead.
We also investigate the mechanisms that enable reliable uncertainty estimation, finding that many models can be used as general-purpose uncertainty estimators.
arXiv Detail & Related papers (2024-06-12T16:41:31Z) - Making Pre-trained Language Models both Task-solvers and
Self-calibrators [52.98858650625623]
Pre-trained language models (PLMs) serve as backbones for various real-world systems.
Previous work shows that introducing an extra calibration task can mitigate this issue.
We propose a training algorithm LM-TOAST to tackle the challenges.
arXiv Detail & Related papers (2023-07-21T02:51:41Z) - Evidence > Intuition: Transferability Estimation for Encoder Selection [16.490047604583882]
We generate quantitative evidence to predict which LM will perform best on a target task without having to fine-tune all candidates.
We adopt the state-of-the-art Logarithm Maximum of Evidence (LogME) measure from Computer Vision (CV) and find that it positively correlates with final LM performance in 94% of setups.
arXiv Detail & Related papers (2022-10-20T13:25:21Z) - How can I choose an explainer? An Application-grounded Evaluation of
Post-hoc Explanations [2.7708222692419735]
Explanations are seldom evaluated based on their true practical impact on decision-making tasks.
This study proposes XAI Test, an application-grounded evaluation methodology tailored to isolate the impact of providing the end-user with different levels of information.
Using strong statistical analysis, we show that, in general, popular explainers have a worse impact than desired.
arXiv Detail & Related papers (2021-01-21T18:15:13Z) - When Does Uncertainty Matter?: Understanding the Impact of Predictive
Uncertainty in ML Assisted Decision Making [68.19284302320146]
We carry out user studies to assess how people with differing levels of expertise respond to different types of predictive uncertainty.
We found that showing posterior predictive distributions led to smaller disagreements with the ML model's predictions.
This suggests that posterior predictive distributions can potentially serve as useful decision aids which should be used with caution and take into account the type of distribution and the expertise of the human.
arXiv Detail & Related papers (2020-11-12T02:23:53Z) - An Information-Theoretic Approach to Personalized Explainable Machine
Learning [92.53970625312665]
We propose a simple probabilistic model for the predictions and user knowledge.
We quantify the effect of an explanation by the conditional mutual information between the explanation and prediction.
arXiv Detail & Related papers (2020-03-01T13:06:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.