Towards Cognitively-Faithful Decision-Making Models to Improve AI Alignment
- URL: http://arxiv.org/abs/2509.04445v1
- Date: Thu, 04 Sep 2025 17:59:29 GMT
- Title: Towards Cognitively-Faithful Decision-Making Models to Improve AI Alignment
- Authors: Cyrus Cousins, Vijay Keswani, Vincent Conitzer, Hoda Heidari, Jana Schaich Borg, Walter Sinnott-Armstrong,
- Abstract summary: We present an axiomatic approach to learning cognitively faithful decision processes from pairwise comparisons.<n>We show that our proposed models match or surpass the accuracy of prior models of human pairwise decision-making.
- Score: 35.506295119331064
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent AI work trends towards incorporating human-centric objectives, with the explicit goal of aligning AI models to personal preferences and societal values. Using standard preference elicitation methods, researchers and practitioners build models of human decisions and judgments, which are then used to align AI behavior with that of humans. However, models commonly used in such elicitation processes often do not capture the true cognitive processes of human decision making, such as when people use heuristics to simplify information associated with a decision problem. As a result, models learned from people's decisions often do not align with their cognitive processes, and can not be used to validate the learning framework for generalization to other decision-making tasks. To address this limitation, we take an axiomatic approach to learning cognitively faithful decision processes from pairwise comparisons. Building on the vast literature characterizing the cognitive processes that contribute to human decision-making, and recent work characterizing such processes in pairwise comparison tasks, we define a class of models in which individual features are first processed and compared across alternatives, and then the processed features are then aggregated via a fixed rule, such as the Bradley-Terry rule. This structured processing of information ensures such models are realistic and feasible candidates to represent underlying human decision-making processes. We demonstrate the efficacy of this modeling approach in learning interpretable models of human decision making in a kidney allocation task, and show that our proposed models match or surpass the accuracy of prior models of human pairwise decision-making.
Related papers
- When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration [79.69935257008467]
We introduce Knowledge Integration and Transfer Evaluation (KITE), a conceptual and experimental framework for Human-AI knowledge transfer capabilities.<n>We conduct the first large-scale human study (N=118) explicitly designed to measure it.<n>In our two-phase setup, humans first ideate with an AI on problem-solving strategies, then independently implement solutions, isolating model explanations' influence on human understanding.
arXiv Detail & Related papers (2025-06-05T20:48:16Z) - Utilizing Human Behavior Modeling to Manipulate Explanations in AI-Assisted Decision Making: The Good, the Bad, and the Scary [19.884253335528317]
Recent advances in AI models have increased the integration of AI-based decision aids into the human decision making process.
To fully unlock the potential of AI-assisted decision making, researchers have computationally modeled how humans incorporate AI recommendations into their final decisions.
Providing AI explanations to human decision makers to help them rely on AI recommendations more appropriately has become a common practice.
arXiv Detail & Related papers (2024-11-02T18:33:28Z) - Exploring the Lands Between: A Method for Finding Differences between AI-Decisions and Human Ratings through Generated Samples [45.209635328908746]
We propose a method to find samples in the latent space of a generative model.
By presenting those samples to both the decision-making model and human raters, we can identify areas where its decisions align with human intuition.
We apply this method to a face recognition model and collect a dataset of 11,200 human ratings from 100 participants.
arXiv Detail & Related papers (2024-09-19T14:14:08Z) - Explain To Decide: A Human-Centric Review on the Role of Explainable
Artificial Intelligence in AI-assisted Decision Making [1.0878040851638]
Machine learning models are error-prone and cannot be used autonomously.
Explainable Artificial Intelligence (XAI) aids end-user understanding of the model.
This paper surveyed the recent empirical studies on XAI's impact on human-AI decision-making.
arXiv Detail & Related papers (2023-12-11T22:35:21Z) - Modeling Boundedly Rational Agents with Latent Inference Budgets [56.24971011281947]
We introduce a latent inference budget model (L-IBM) that models agents' computational constraints explicitly.
L-IBMs make it possible to learn agent models using data from diverse populations of suboptimal actors.
We show that L-IBMs match or outperform Boltzmann models of decision-making under uncertainty.
arXiv Detail & Related papers (2023-12-07T03:55:51Z) - From DDMs to DNNs: Using process data and models of decision-making to improve human-AI interactions [1.024113475677323]
We argue that artificial intelligence (AI) research would benefit from a stronger focus on insights about how decisions emerge over time.<n>First, we introduce a highly established computational framework that assumes decisions to emerge from the noisy accumulation of evidence.<n>Next, we discuss to what extent current approaches in multi-agent AI do or do not incorporate process data and models of decision making.
arXiv Detail & Related papers (2023-08-29T11:27:22Z) - Modeling Human Behavior Part II -- Cognitive approaches and Uncertainty [0.0]
In Part I, we discussed methods which generate a model of behavior from exploration of the system and feedback based on the exhibited behavior.
In this work, we will continue the discussion from the perspective of methods which focus on the assumed cognitive abilities, limitations, and biases demonstrated in human reasoning.
arXiv Detail & Related papers (2022-05-13T07:29:15Z) - Inverse Online Learning: Understanding Non-Stationary and Reactionary
Policies [79.60322329952453]
We show how to develop interpretable representations of how agents make decisions.
By understanding the decision-making processes underlying a set of observed trajectories, we cast the policy inference problem as the inverse to this online learning problem.
We introduce a practical algorithm for retrospectively estimating such perceived effects, alongside the process through which agents update them.
Through application to the analysis of UNOS organ donation acceptance decisions, we demonstrate that our approach can bring valuable insights into the factors that govern decision processes and how they change over time.
arXiv Detail & Related papers (2022-03-14T17:40:42Z) - Indecision Modeling [50.00689136829134]
It is important that AI systems act in ways which align with human values.
People are often indecisive, and especially so when their decision has moral implications.
arXiv Detail & Related papers (2020-12-15T18:32:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.