Related papers: Sparks of Rationality: Do Reasoning LLMs Align with Human Judgment and Choice?

Sparks of Rationality: Do Reasoning LLMs Align with Human Judgment and Choice?

URL: http://arxiv.org/abs/2601.22329v1
Date: Thu, 29 Jan 2026 21:17:06 GMT
Title: Sparks of Rationality: Do Reasoning LLMs Align with Human Judgment and Choice?
Authors: Ala N. Tak, Amin Banayeeanzade, Anahita Bolourani, Fatemeh Bahrani, Ashutosh Chaubey, Sai Praneeth Karimireddy, Norbert Schwarz, Jonathan Gratch,
Abstract summary: Large Language Models are increasingly positioned as decision engines for hiring, healthcare, and economic judgment.<n>It is critical to assess whether they exhibit analogous patterns of (ir)rationalities and biases.
Score: 10.367910587365529
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are increasingly positioned as decision engines for hiring, healthcare, and economic judgment, yet real-world human judgment reflects a balance between rational deliberation and emotion-driven bias. If LLMs are to participate in high-stakes decisions or serve as models of human behavior, it is critical to assess whether they exhibit analogous patterns of (ir)rationalities and biases. To this end, we evaluate multiple LLM families on (i) benchmarks testing core axioms of rational choice and (ii) classic decision domains from behavioral economics and social norms where emotions are known to shape judgment and choice. Across settings, we show that deliberate "thinking" reliably improves rationality and pushes models toward expected-value maximization. To probe human-like affective distortions and their interaction with reasoning, we use two emotion-steering methods: in-context priming (ICP) and representation-level steering (RLS). ICP induces strong directional shifts that are often extreme and difficult to calibrate, whereas RLS produces more psychologically plausible patterns but with lower reliability. Our results suggest that the same mechanisms that improve rationality also amplify sensitivity to affective interventions, and that different steering methods trade off controllability against human-aligned behavior. Overall, this points to a tension between reasoning and affective steering, with implications for both human simulation and the safe deployment of LLM-based decision systems.

Related papers

Fast, Slow, and Tool-augmented Thinking for LLMs: A Review [57.16858582049339]
Large Language Models (LLMs) have demonstrated remarkable progress in reasoning across diverse domains.<n>Effective reasoning in real-world tasks requires adapting the reasoning strategy to the demands of the problem.<n>We propose a novel taxonomy of LLM reasoning strategies along two knowledge boundaries.
arXiv Detail & Related papers (2025-08-17T07:20:32Z)
Beyond Nash Equilibrium: Bounded Rationality of LLMs and humans in Strategic Decision-making [33.2843381902912]
Large language models are increasingly used in strategic decision-making settings.<n>We compare LLMs and humans using experimental paradigms adapted from behavioral game-theory research.
arXiv Detail & Related papers (2025-06-11T04:43:54Z)
Who Gets the Kidney? Human-AI Alignment, Indecision, and Moral Values [36.47201247038004]
We show that Large Language Models (LLMs) exhibit stark deviations from human values in prioritizing various attributes.<n>We show that low-rank supervised fine-tuning with few samples is often effective in improving both decision consistency and calibrating indecision modeling.
arXiv Detail & Related papers (2025-05-30T01:23:11Z)
Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Experiments [5.1382713576243955]
Large language models (LLMs) are increasingly used to simulate or automate human behavior in sequential decision-making settings.<n>We focus on the exploration-exploitation (E&E) tradeoff, a fundamental aspect of dynamic decision-making under uncertainty.<n>We find that enabling thinking in LLMs shifts their behavior toward more human-like behavior, characterized by a mix of random and directed exploration.
arXiv Detail & Related papers (2025-05-15T02:09:18Z)
Self-Adaptive Cognitive Debiasing for Large Language Models in Decision-Making [71.71796367760112]
Large language models (LLMs) have shown potential in supporting decision-making applications.<n>We propose a cognitive debiasing approach, self-adaptive cognitive debiasing (SACD)<n>We evaluate SACD on finance, healthcare, and legal decision-making tasks using both open-weight and closed-weight LLMs.
arXiv Detail & Related papers (2025-04-05T11:23:05Z)
Measurement of LLM's Philosophies of Human Nature [113.47929131143766]
We design the standardized psychological scale specifically targeting large language models (LLM)<n>We show that current LLMs exhibit a systemic lack of trust in humans.<n>We propose a mental loop learning framework, which enables LLM to continuously optimize its value system.
arXiv Detail & Related papers (2025-04-03T06:22:19Z)
Failure Modes of LLMs for Causal Reasoning on Narratives [51.19592551510628]
We investigate the interaction between world knowledge and logical reasoning.<n>We find that state-of-the-art large language models (LLMs) often rely on superficial generalizations.<n>We show that simple reformulations of the task can elicit more robust reasoning behavior.
arXiv Detail & Related papers (2024-10-31T12:48:58Z)
Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context [5.361970694197912]
This paper proposes a framework, grounded in behavioral economics, to evaluate the decision-making behaviors of large language models (LLMs) We estimate the degree of risk preference, probability weighting, and loss aversion in a context-free setting for three commercial LLMs: ChatGPT-4.0-Turbo, Claude-3-Opus, and Gemini-1.0-pro. Our results reveal that LLMs generally exhibit patterns similar to humans, such as risk aversion and loss aversion, with a tendency to overweight small probabilities.
arXiv Detail & Related papers (2024-06-10T02:14:19Z)
PHAnToM: Persona-based Prompting Has An Effect on Theory-of-Mind Reasoning in Large Language Models [25.657579792829743]
We empirically evaluate how role-playing prompting influences Theory-of-Mind (ToM) reasoning capabilities. We propose the mechanism that, beyond the inherent variance in the complexity of reasoning tasks, performance differences arise because of socially-motivated prompting differences.
arXiv Detail & Related papers (2024-03-04T17:34:34Z)
From Heuristic to Analytic: Cognitively Motivated Strategies for Coherent Physical Commonsense Reasoning [66.98861219674039]
Heuristic-Analytic Reasoning (HAR) strategies drastically improve the coherence of rationalizations for model decisions. Our findings suggest that human-like reasoning strategies can effectively improve the coherence and reliability of PLM reasoning.
arXiv Detail & Related papers (2023-10-24T19:46:04Z)
Empirical Estimates on Hand Manipulation are Recoverable: A Step Towards Individualized and Explainable Robotic Support in Everyday Activities [80.37857025201036]
Key challenge for robotic systems is to figure out the behavior of another agent. Processing correct inferences is especially challenging when (confounding) factors are not controlled experimentally. We propose equipping robots with the necessary tools to conduct observational studies on people.
arXiv Detail & Related papers (2022-01-27T22:15:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.