The Paradox of Robustness: Decoupling Rule-Based Logic from Affective Noise in High-Stakes Decision-Making
- URL: http://arxiv.org/abs/2601.21439v1
- Date: Thu, 29 Jan 2026 09:17:05 GMT
- Title: The Paradox of Robustness: Decoupling Rule-Based Logic from Affective Noise in High-Stakes Decision-Making
- Authors: Jon Chun, Katherine Elkins,
- Abstract summary: Large Language Models (LLMs) are widely documented to be sensitive to minor prompt perturbations and prone to sycophantic alignment with user biases.<n>We quantify a robustness gap where LLMs demonstrate 110-300 times greater resistance to narrative manipulation than human subjects.<n>We show that while LLMs may be "brittle" to how a query is formatted, they are remarkably "stable" against why a decision should be biased.
- Score: 1.0671844383558033
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While Large Language Models (LLMs) are widely documented to be sensitive to minor prompt perturbations and prone to sycophantic alignment with user biases, their robustness in consequential, rule-bound decision-making remains under-explored. In this work, we uncover a striking "Paradox of Robustness": despite their known lexical brittleness, instruction-tuned LLMs exhibit a behavioral and near-total invariance to emotional framing effects. Using a novel controlled perturbation framework across three high-stakes domains (healthcare, law, and finance), we quantify a robustness gap where LLMs demonstrate 110-300 times greater resistance to narrative manipulation than human subjects. Specifically, we find a near-zero effect size for models (Cohen's h = 0.003) compared to the substantial biases observed in humans (Cohen's h in [0.3, 0.8]). This result is highly counterintuitive and suggests the mechanisms driving sycophancy and prompt sensitivity do not necessarily translate to a failure in logical constraint satisfaction. We show that this invariance persists across models with diverse training paradigms. Our findings show that while LLMs may be "brittle" to how a query is formatted, they are remarkably "stable" against why a decision should be biased. Our findings establish that instruction-tuned models can decouple logical rule-adherence from persuasive narratives, offering a source of decision stability that complements, and even potentially de-biases, human judgment in institutional contexts. We release the 162-scenario benchmark, code, and data to facilitate the rigorous evaluation of narrative-induced bias and robustness on GitHub.com.
Related papers
- Are Reasoning LLMs Robust to Interventions on Their Chain-of-Thought? [79.86483056611105]
Reasoning LLMs generate step-by-step chains of thought before giving an answer.<n>How robust are these reasoning traces to disruptions that occur within them?<n>We introduce a controlled evaluation framework that perturbs a model's own CoT at fixed timesteps.
arXiv Detail & Related papers (2026-02-07T10:02:58Z) - Same Answer, Different Representations: Hidden instability in VLMs [65.36933543377346]
We introduce a representation-aware and frequency-aware evaluation framework that measures internal embedding drift, spectral sensitivity, and structural smoothness.<n>We apply this framework to modern Vision Language Models (VLMs) across the SEEDBench, MMMU, and POPE datasets.
arXiv Detail & Related papers (2026-02-06T12:24:26Z) - Syntactic Framing Fragility: An Audit of Robustness in LLM Ethical Decisions [1.0671844383558033]
Large language models (LLMs) are increasingly deployed in consequential decision-making settings.<n>We study whether LLMs maintain consistent ethical judgments across logically equivalent but syntactically different prompts.<n>We introduce Syntactic Framing Fragility (SFF), a robustness evaluation framework that isolates purely syntactic effects.
arXiv Detail & Related papers (2025-12-27T18:09:34Z) - Chain-of-Code Collapse: Reasoning Failures in LLMs via Adversarial Prompting in Code Generation [0.3495246564946556]
Large Language Models (LLMs) have achieved remarkable success in tasks requiring complex reasoning.<n>Do these models truly reason, or do they merely exploit shallow statistical patterns?<n>We introduce Chain-of-Code Collapse, where we investigate the robustness of reasoning LLMs by introducing a suite of semantically faithful yet adversarially structured prompt perturbations.
arXiv Detail & Related papers (2025-06-08T02:43:46Z) - More or Less Wrong: A Benchmark for Directional Bias in LLM Comparative Reasoning [10.301985230669684]
We study the mechanisms by which semantic cues shape reasoning in large language models.<n>We introduce MathComp, a benchmark of 300 comparison scenarios.<n>We find that model errors frequently reflect linguistic steering, systematic shifts toward the comparative term present in the prompt.
arXiv Detail & Related papers (2025-06-04T13:15:01Z) - Mitigating Content Effects on Reasoning in Language Models through Fine-Grained Activation Steering [14.298418197820912]
Large language models (LLMs) frequently demonstrate reasoning limitations, often conflating content plausibility with logical validity.<n>This can result in biased inferences, where plausible arguments are incorrectly deemed logically valid or vice versa.<n>This paper investigates the problem of mitigating content biases on formal reasoning through activation steering.
arXiv Detail & Related papers (2025-05-18T01:34:34Z) - Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models [86.88657425848547]
Large reasoning models (LRMs) already possess a latent capacity for long chain-of-thought reasoning.<n>We explicitly align models with three meta-abilities: deduction, induction, and abduction, using automatically generated, self-verifiable tasks.<n>Our three stage-pipeline individual alignment, parameter-space merging, and domain-specific reinforcement learning, boosts performance by over 10% relative to instruction-tuned baselines.
arXiv Detail & Related papers (2025-05-15T17:58:33Z) - SEAL: Steerable Reasoning Calibration of Large Language Models for Free [58.931194824519935]
Large Language Models (LLMs) have demonstrated compelling capabilities for complex reasoning tasks via the extended chain-of-thought (CoT) reasoning mechanism.<n>Recent studies reveal substantial redundancy in the CoT reasoning traces, which negatively impacts model performance.<n>We introduce SEAL, a training-free approach that seamlessly calibrates the CoT process, improving accuracy while demonstrating significant efficiency gains.
arXiv Detail & Related papers (2025-04-07T02:42:07Z) - Robustness and Accuracy Could Be Reconcilable by (Proper) Definition [109.62614226793833]
The trade-off between robustness and accuracy has been widely studied in the adversarial literature.
We find that it may stem from the improperly defined robust error, which imposes an inductive bias of local invariance.
By definition, SCORE facilitates the reconciliation between robustness and accuracy, while still handling the worst-case uncertainty.
arXiv Detail & Related papers (2022-02-21T10:36:09Z) - Fairness Through Robustness: Investigating Robustness Disparity in Deep
Learning [61.93730166203915]
We argue that traditional notions of fairness are not sufficient when the model is vulnerable to adversarial attacks.
We show that measuring robustness bias is a challenging task for DNNs and propose two methods to measure this form of bias.
arXiv Detail & Related papers (2020-06-17T22:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.