Benchmarking the rationality of AI decision making using the transitivity axiom
- URL: http://arxiv.org/abs/2502.10554v1
- Date: Fri, 14 Feb 2025 20:56:40 GMT
- Title: Benchmarking the rationality of AI decision making using the transitivity axiom
- Authors: Kiwon Song, James M. Jennings III, Clintin P. Davis-Stober,
- Abstract summary: We evaluate the rationality of AI responses via a series of choice experiments designed to evaluate transitivity of preference in humans.<n>We found that the Llama 2 and 3 models generally satisfied transitivity, but when violations did occur, occurred only in the Chat/Instruct versions of the LLMs.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fundamental choice axioms, such as transitivity of preference, provide testable conditions for determining whether human decision making is rational, i.e., consistent with a utility representation. Recent work has demonstrated that AI systems trained on human data can exhibit similar reasoning biases as humans and that AI can, in turn, bias human judgments through AI recommendation systems. We evaluate the rationality of AI responses via a series of choice experiments designed to evaluate transitivity of preference in humans. We considered ten versions of Meta's Llama 2 and 3 LLM models. We applied Bayesian model selection to evaluate whether these AI-generated choices violated two prominent models of transitivity. We found that the Llama 2 and 3 models generally satisfied transitivity, but when violations did occur, occurred only in the Chat/Instruct versions of the LLMs. We argue that rationality axioms, such as transitivity of preference, can be useful for evaluating and benchmarking the quality of AI-generated responses and provide a foundation for understanding computational rationality in AI systems more generally.
Related papers
- Teaching AI to Handle Exceptions: Supervised Fine-Tuning with Human-Aligned Judgment [0.0]
Large language models (LLMs) are evolving into agentic AI systems, but their decision-making processes remain poorly understood.<n>We show that even LLMs that excel at reasoning deviate significantly from human judgments because they adhere strictly to policies.<n>We then evaluate three approaches to tuning AI agents to handle exceptions: ethical framework prompting, chain-of-thought reasoning, and supervised fine-tuning.
arXiv Detail & Related papers (2025-03-04T20:00:37Z) - Hacking a surrogate model approach to XAI [49.1574468325115]
We show that even if a discriminated subgroup does not get a positive decision from the black box ADM system, the corresponding question of group membership can be pushed down onto a level as low as wanted.
Our approach can be generalized easily to other surrogate models.
arXiv Detail & Related papers (2024-06-24T13:18:02Z) - Aligning Large Language Models from Self-Reference AI Feedback with one General Principle [61.105703857868775]
We propose a self-reference-based AI feedback framework that enables a 13B Llama2-Chat to provide high-quality feedback.
Specifically, we allow the AI to first respond to the user's instructions, then generate criticism of other answers based on its own response as a reference.
Finally, we determine which answer better fits human preferences according to the criticism.
arXiv Detail & Related papers (2024-06-17T03:51:46Z) - Does AI help humans make better decisions? A statistical evaluation framework for experimental and observational studies [0.43981305860983716]
We show how to compare the performance of three alternative decision-making systems--human-alone, human-with-AI, and AI-alone.
We find that the risk assessment recommendations do not improve the classification accuracy of a judge's decision to impose cash bail.
arXiv Detail & Related papers (2024-03-18T01:04:52Z) - A Decision Theoretic Framework for Measuring AI Reliance [23.353778024330165]
Humans frequently make decisions with the aid of artificially intelligent (AI) systems.
Researchers have identified ensuring that a human has appropriate reliance on an AI as a critical component of achieving complementary performance.
We propose a formal definition of reliance, based on statistical decision theory, which separates the concepts of reliance as the probability the decision-maker follows the AI's recommendation.
arXiv Detail & Related papers (2024-01-27T09:13:09Z) - Towards Understanding Sycophancy in Language Models [49.99654432561934]
We investigate the prevalence of sycophancy in models whose finetuning procedure made use of human feedback.
We show that five state-of-the-art AI assistants consistently exhibit sycophancy across four varied free-form text-generation tasks.
Our results indicate that sycophancy is a general behavior of state-of-the-art AI assistants, likely driven in part by human preference judgments favoring sycophantic responses.
arXiv Detail & Related papers (2023-10-20T14:46:48Z) - Can Machines Imitate Humans? Integrative Turing-like tests for Language and Vision Demonstrate a Narrowing Gap [56.611702960809644]
We benchmark AI's ability to imitate humans in three language tasks and three vision tasks.<n>Next, we conducted 72,191 Turing-like tests with 1,916 human judges and 10 AI judges.<n>Imitation ability showed minimal correlation with conventional AI performance metrics.
arXiv Detail & Related papers (2022-11-23T16:16:52Z) - On Explainability in AI-Solutions: A Cross-Domain Survey [4.394025678691688]
In automatically deriving a system model, AI algorithms learn relations in data that are not detectable for humans.
The more complex a model, the more difficult it is for a human to understand the reasoning for the decisions.
This work provides an extensive survey of literature on this topic, which, to a large part, consists of other surveys.
arXiv Detail & Related papers (2022-10-11T06:21:47Z) - A Human-Centric Assessment Framework for AI [11.065260433086024]
There is no agreed standard on how explainable AI systems should be assessed.
Inspired by the Turing test, we introduce a human-centric assessment framework.
This setup can serve as framework for a wide range of human-centric AI system assessments.
arXiv Detail & Related papers (2022-05-25T12:59:13Z) - Cybertrust: From Explainable to Actionable and Interpretable AI (AI2) [58.981120701284816]
Actionable and Interpretable AI (AI2) will incorporate explicit quantifications and visualizations of user confidence in AI recommendations.
It will allow examining and testing of AI system predictions to establish a basis for trust in the systems' decision making.
arXiv Detail & Related papers (2022-01-26T18:53:09Z) - Indecision Modeling [50.00689136829134]
It is important that AI systems act in ways which align with human values.
People are often indecisive, and especially so when their decision has moral implications.
arXiv Detail & Related papers (2020-12-15T18:32:37Z) - Challenging common interpretability assumptions in feature attribution
explanations [0.0]
We empirically evaluate the veracity of three common interpretability assumptions through a large scale human-subjects experiment.
We find that feature attribution explanations provide marginal utility in our task for a human decision maker.
arXiv Detail & Related papers (2020-12-04T17:57:26Z) - Is the Most Accurate AI the Best Teammate? Optimizing AI for Teamwork [54.309495231017344]
We argue that AI systems should be trained in a human-centered manner, directly optimized for team performance.
We study this proposal for a specific type of human-AI teaming, where the human overseer chooses to either accept the AI recommendation or solve the task themselves.
Our experiments with linear and non-linear models on real-world, high-stakes datasets show that the most accuracy AI may not lead to highest team performance.
arXiv Detail & Related papers (2020-04-27T19:06:28Z) - Effect of Confidence and Explanation on Accuracy and Trust Calibration
in AI-Assisted Decision Making [53.62514158534574]
We study whether features that reveal case-specific model information can calibrate trust and improve the joint performance of the human and AI.
We show that confidence score can help calibrate people's trust in an AI model, but trust calibration alone is not sufficient to improve AI-assisted decision making.
arXiv Detail & Related papers (2020-01-07T15:33:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.