Related papers: Agency and Architectural Limits: Why Optimization-Based Systems Cannot Be Norm-Responsive

Agency and Architectural Limits: Why Optimization-Based Systems Cannot Be Norm-Responsive

URL: http://arxiv.org/abs/2602.23239v2
Date: Sun, 01 Mar 2026 18:44:49 GMT
Title: Agency and Architectural Limits: Why Optimization-Based Systems Cannot Be Norm-Responsive
Authors: Radha Sarma,
Abstract summary: AI systems are increasingly deployed in high-stakes contexts under the assumption they can be governed by norms.<n>This paper demonstrates that the assumption is formally invalid for optimization-based systems.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: AI systems are increasingly deployed in high-stakes contexts (medical diagnosis, legal research, financial analysis) under the assumption they can be governed by norms. This paper demonstrates that the assumption is formally invalid for optimization-based systems, specifically Large Language Models trained via Reinforcement Learning from Human Feedback (RLHF). Genuine agency requires two necessary and jointly sufficient architectural conditions. First, the capacity to maintain certain boundaries as non-negotiable constraints rather than tradeable weights (Incommensurability). Second, a non-inferential mechanism capable of suspending processing when those boundaries are threatened (Apophatic Responsiveness). RLHF-based systems are constitutively incompatible with both conditions. The operations that make optimization powerful, unifying all values on a scalar metric and always selecting the highest-scoring output, are precisely the operations that preclude normative governance and agency. This incompatibility is not a correctable training bug awaiting a technical fix. It is a formal constraint inherent to what optimization is. Consequently, documented failure modes (sycophancy, hallucination, and unfaithful reasoning) are not accidents but expected structural manifestations. Misaligned deployment triggers a second-order risk termed the Convergence Crisis. When humans are forced to verify AI outputs under metric pressure, they degrade from genuine agents into criteria-checking optimizers, eliminating the only component capable of bearing normative accountability. Beyond the incompatibility proof, this paper's primary positive contribution is a substrate-neutral architectural specification deriving what any system (biological, artificial, or institutional) must necessarily satisfy to qualify as a genuine agent rather than a sophisticated instrument.

Related papers

Decoding ML Decision: An Agentic Reasoning Framework for Large-Scale Ranking System [26.405948122941467]
We present GEARS, a framework that reframes ranking optimization as an autonomous discovery process.<n>We show that GEARS consistently identifies superior, near-Pareto-efficient policies by synergizing algorithmic signals with deep ranking context.
arXiv Detail & Related papers (2026-02-20T22:24:01Z)
FormalJudge: A Neuro-Symbolic Paradigm for Agentic Oversight [21.731032636844237]
This paper proposes a neuro-symbolic framework that employs a bidirectional Formal-of-Thought architecture.<n>We validate across three benchmarks spanning behavioral safety, multi-domain constraint adherence, and agentic upward deception detection.
arXiv Detail & Related papers (2026-02-11T18:48:11Z)
ProAct: Agentic Lookahead in Interactive Environments [56.50613398808361]
ProAct is a framework that enables agents to internalize accurate lookahead reasoning through a two-stage training paradigm.<n>We introduce Grounded LookAhead Distillation (GLAD), where the agent undergoes supervised fine-tuning on trajectories derived from environment-based search.<n>We also propose the Monte-Carlo Critic (MC-Critic), a plug-and-play auxiliary value estimator designed to enhance policy-gradient algorithms.
arXiv Detail & Related papers (2026-02-05T05:45:16Z)
CircuChain: Disentangling Competence and Compliance in LLM Circuit Analysis [0.0]
We introduce CircuChain, a diagnostic benchmark designed to disentangle instruction compliance from physical reasoning competence in electrical circuit analysis.<n>A multi-stage verification pipeline, combining symbolic solvers, SPICE simulation, and an LLM-based error taxonomy, enables fine-grained attribution of failures to convention errors.<n>The strongest model evaluated exhibits near-perfect physical reasoning but a high rate of convention violations when Trap conditions deliberately invert natural sign patterns.
arXiv Detail & Related papers (2026-01-29T06:13:44Z)
Agentic Uncertainty Quantification [76.94013626702183]
We propose a unified Dual-Process Agentic UQ (AUQ) framework that transforms verbalized uncertainty into active, bi-directional control signals.<n>Our architecture comprises two complementary mechanisms: System 1 (Uncertainty-Aware Memory, UAM), which implicitly propagates verbalized confidence and semantic explanations to prevent blind decision-making; and System 2 (Uncertainty-Aware Reflection, UAR), which utilizes these explanations as rational cues to trigger targeted inference-time resolution only when necessary.
arXiv Detail & Related papers (2026-01-22T07:16:26Z)
SciIF: Benchmarking Scientific Instruction Following Towards Rigorous Scientific Intelligence [60.202862987441684]
We introduce scientific instruction following: the capability to solve problems while strictly adhering to the constraints that establish scientific validity.<n>Specifically, we introduce SciIF, a multi-discipline benchmark that evaluates this capability by pairing university-level problems with a fixed catalog of constraints.<n>By measuring both solution correctness and multi-constraint adherence, SciIF enables finegrained diagnosis of compositional reasoning failures.
arXiv Detail & Related papers (2026-01-08T09:45:58Z)
From Linear Risk to Emergent Harm: Complexity as the Missing Core of AI Governance [0.0]
Risk-based AI regulation promises proportional controls aligned with anticipated harms.<n>This paper argues that such frameworks often fail for structural reasons.<n>We propose a complexity-based framework for AI governance that treats regulation as intervention rather than control.
arXiv Detail & Related papers (2025-12-14T14:19:21Z)
Towards a Science of Scaling Agent Systems [79.64446272302287]
We formalize a definition for agent evaluation and characterize scaling laws as the interplay between agent quantity, coordination structure, modelic, and task properties.<n>We derive a predictive model using coordination metrics, that cross-validated R2=0, enabling prediction on unseen task domains.<n>We identify three effects: (1) a tool-coordination trade-off: under fixed computational budgets, tool-heavy tasks suffer disproportionately from multi-agent overhead, and (2) a capability saturation: coordination yields diminishing or negative returns once single-agent baselines exceed 45%.
arXiv Detail & Related papers (2025-12-09T06:52:21Z)
Take Goodhart Seriously: Principled Limit on General-Purpose AI Optimization [0.0]
We argue that approximation, estimation, and optimization errors guarantee systematic deviations from the intended objective.<n>A principled limit on the optimization of General-Purpose AI systems is necessary.
arXiv Detail & Related papers (2025-10-03T09:25:12Z)
Diagnose, Localize, Align: A Full-Stack Framework for Reliable LLM Multi-Agent Systems under Instruction Conflicts [75.20929587906228]
Large Language Model (LLM)-powered multi-agent systems (MAS) have rapidly advanced collaborative reasoning, tool use, and role-specialized coordination in complex tasks.<n>However, reliability-critical deployment remains hindered by a systemic failure mode: hierarchical compliance under instruction conflicts.
arXiv Detail & Related papers (2025-09-27T08:43:34Z)
Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories [58.988535279557546]
We introduce textbf sycophancy Mitigation through Adaptive Reasoning Trajectories.<n>We show that SMART significantly reduces sycophantic behavior while preserving strong performance on out-of-distribution inputs.
arXiv Detail & Related papers (2025-09-20T17:09:14Z)
Explainable AI Systems Must Be Contestable: Here's How to Make It Happen [2.5875936082584623]
This paper presents the first rigorous formal definition of contestability in explainable AI.<n>We introduce a modular framework of by-design and post-hoc mechanisms spanning human-centered interfaces, technical processes, and organizational architectures.<n>Our work equips practitioners with the tools to embed genuine recourse and accountability into AI systems.
arXiv Detail & Related papers (2025-06-02T13:32:05Z)
Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL [86.0987896274354]
We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL. We then propose a novel Self-Excite Eigenvalue Measure (SEEM) metric to measure the evolving property of Q-network at training. For the first time, our theory can reliably decide whether the training will diverge at an early stage.
arXiv Detail & Related papers (2023-10-06T17:57:44Z)
Pointwise Feasibility of Gaussian Process-based Safety-Critical Control under Model Uncertainty [77.18483084440182]
Control Barrier Functions (CBFs) and Control Lyapunov Functions (CLFs) are popular tools for enforcing safety and stability of a controlled system, respectively. We present a Gaussian Process (GP)-based approach to tackle the problem of model uncertainty in safety-critical controllers that use CBFs and CLFs.
arXiv Detail & Related papers (2021-06-13T23:08:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.