Making LLMs Reliable When It Matters Most: A Five-Layer Architecture for High-Stakes Decisions
- URL: http://arxiv.org/abs/2511.07669v1
- Date: Wed, 12 Nov 2025 01:10:11 GMT
- Title: Making LLMs Reliable When It Matters Most: A Five-Layer Architecture for High-Stakes Decisions
- Authors: Alejandro R. Jadad,
- Abstract summary: Current large language models (LLMs) excel in verifiable domains where outputs can be checked before action but prove less reliable for high-stakes strategic decisions with uncertain outcomes.<n>This gap, driven by mutually cognitive biases in both humans and artificial intelligence (AI) systems, threatens the defensibility of valuations and sustainability of investments in the sector.<n>This report describes a framework emerging from systematic qualitative assessment across 7 frontier-grade LLMs and 3 market-facing venture vignettes under time pressure.
- Score: 51.56484100374058
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current large language models (LLMs) excel in verifiable domains where outputs can be checked before action but prove less reliable for high-stakes strategic decisions with uncertain outcomes. This gap, driven by mutually reinforcing cognitive biases in both humans and artificial intelligence (AI) systems, threatens the defensibility of valuations and sustainability of investments in the sector. This report describes a framework emerging from systematic qualitative assessment across 7 frontier-grade LLMs and 3 market-facing venture vignettes under time pressure. Detailed prompting specifying decision partnership and explicitly instructing avoidance of sycophancy, confabulation, solution drift, and nihilism achieved initial partnership state but failed to maintain it under operational pressure. Sustaining protective partnership state required an emergent 7-stage calibration sequence, built upon a 4-stage initialization process, within a 5-layer protection architecture enabling bias self-monitoring, human-AI adversarial challenge, partnership state verification, performance degradation detection, and stakeholder protection. Three discoveries resulted: partnership state is achievable through ordered calibration but requires emergent maintenance protocols; reliability degrades when architectural drift and context exhaustion align; and dissolution discipline prevents costly pursuit of fundamentally wrong directions. Cross-model validation revealed systematic performance differences across LLM architectures. This approach demonstrates that human-AI teams can achieve cognitive partnership capable of preventing avoidable regret in high-stakes decisions, addressing return-on-investment expectations that depend on AI systems supporting consequential decision-making without introducing preventable cognitive traps when verification arrives too late.
Related papers
- Beyond Task Completion: Revealing Corrupt Success in LLM Agents through Procedure-Aware Evaluation [2.102846336724103]
Procedure-Aware Evaluation (PAE) is a framework that formalizes agent procedures as structured observations.<n>We evaluate state-of-the-art Large Language Model (LLM)-based agents on tau-bench.
arXiv Detail & Related papers (2026-03-03T15:47:41Z) - MAS-FIRE: Fault Injection and Reliability Evaluation for LLM-Based Multi-Agent Systems [38.44649280816596]
We propose MAS-FIRE, a systematic framework for fault injection and reliability evaluation of Multi-Agent Systems.<n>We define a taxonomy of 15 fault types covering intra-agent cognitive errors and inter-agent coordination failures.<n>Applying MAS-FIRE to three representative MAS architectures, we uncover a rich set of fault-tolerant behaviors.
arXiv Detail & Related papers (2026-02-23T13:47:43Z) - The Chancellor Trap: Administrative Mediation and the Hollowing of Sovereignty in the Algorithmic Age [0.0]
In high- throughput organizations, AI-mediated decision support can reduce the probability that failures become publicly legible and politically contestable.<n>The article formalizes this dynamic as a principal-agent problem characterized by a verification gap.<n>The results are consistent with a paradox of competence: governance systems may become more effective at absorbing and resolving failures internally while simultaneously raising the threshold at which those failures become politically visible.
arXiv Detail & Related papers (2026-02-09T07:28:44Z) - Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification [71.98473277917962]
Recent advances in Deep Research Agents (DRAs) are transforming automated knowledge discovery and problem-solving.<n>We propose an alternative paradigm: self-evolving the agent's ability by iteratively verifying the policy model's outputs, guided by meticulously crafted rubrics.<n>We present DeepVerifier, a rubrics-based outcome reward verifier that leverages the asymmetry of verification.
arXiv Detail & Related papers (2026-01-22T09:47:31Z) - BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search [72.87861928940929]
Boundary-Aware Policy Optimization (BAPO) is a novel RL framework designed to cultivate reliable boundary awareness without compromising accuracy.<n>BAPO introduces two key components: (i) a group-based boundary-aware reward that encourages an IDK response only when the reasoning reaches its limit, and (ii) an adaptive reward modulator that strategically suspends this reward during early exploration, preventing the model from exploiting IDK as a shortcut.
arXiv Detail & Related papers (2026-01-16T07:06:58Z) - From Failure Modes to Reliability Awareness in Generative and Agentic AI System [0.20391237204597365]
This chapter bridges technical analysis and organizational preparedness by tracing the path from layered failure modes to reliability awareness in generative and agentic AI systems.<n>We first introduce an 11-layer failure stack, a structured framework for identifying vulnerabilities ranging from hardware and power foundations to adaptive learning and agentic reasoning.<n>To complement this diagnostic lens, we develop the concept of awareness mapping: a maturity-oriented framework that quantifies how well individuals and organizations recognize reliability risks across the AI stack.
arXiv Detail & Related papers (2025-10-24T19:12:07Z) - Confidence-Based Response Abstinence: Improving LLM Trustworthiness via Activation-Based Uncertainty Estimation [7.3923284353934875]
We propose a method for confidence estimation in retrieval-augmented generation (RAG) systems that aligns closely with the correctness of large language model (LLM) outputs.<n>Our approach extends prior uncertainty quantification methods by leveraging raw feed-forward network (FFN) activations as auto-regressive signals.<n>Our results demonstrate that activation-based confidence modeling offers a scalable, architecture-aware path toward trustworthy RAG deployment.
arXiv Detail & Related papers (2025-10-15T16:55:56Z) - Benchmarking is Broken -- Don't Let AI be its Own Judge [22.93026946593552]
We argue that the current laissez-faire approach to evaluating AI is unsustainable.<n>We introduce PeerBench, a community-governed, proctored evaluation blueprint.<n>Our goal is to pave the way for evaluations that can restore integrity and deliver genuinely trustworthy measures of AI progress.
arXiv Detail & Related papers (2025-10-08T21:41:37Z) - Never Compromise to Vulnerabilities: A Comprehensive Survey on AI Governance [211.5823259429128]
We propose a comprehensive framework integrating technical and societal dimensions, structured around three interconnected pillars: Intrinsic Security, Derivative Security, and Social Ethics.<n>We identify three core challenges: (1) the generalization gap, where defenses fail against evolving threats; (2) inadequate evaluation protocols that overlook real-world risks; and (3) fragmented regulations leading to inconsistent oversight.<n>Our framework offers actionable guidance for researchers, engineers, and policymakers to develop AI systems that are not only robust and secure but also ethically aligned and publicly trustworthy.
arXiv Detail & Related papers (2025-08-12T09:42:56Z) - Know Where You're Uncertain When Planning with Multimodal Foundation Models: A Formal Framework [54.40508478482667]
We present a comprehensive framework to disentangle, quantify, and mitigate uncertainty in perception and plan generation.<n>We propose methods tailored to the unique properties of perception and decision-making.<n>We show that our uncertainty disentanglement framework reduces variability by up to 40% and enhances task success rates by 5% compared to baselines.
arXiv Detail & Related papers (2024-11-03T17:32:00Z) - Meta-Sealing: A Revolutionizing Integrity Assurance Protocol for Transparent, Tamper-Proof, and Trustworthy AI System [0.0]
This research introduces Meta-Sealing, a cryptographic framework that fundamentally changes integrity verification in AI systems.
The framework combines advanced cryptography with distributed verification, delivering tamper-evident guarantees that achieve both mathematical rigor and computational efficiency.
arXiv Detail & Related papers (2024-10-31T15:31:22Z) - Trustworthy AI [75.99046162669997]
Brittleness to minor adversarial changes in the input data, ability to explain the decisions, address the bias in their training data, are some of the most prominent limitations.
We propose the tutorial on Trustworthy AI to address six critical issues in enhancing user and public trust in AI systems.
arXiv Detail & Related papers (2020-11-02T20:04:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.