Selection as Power: Constrained Reinforcement for Bounded Decision Authority
- URL: http://arxiv.org/abs/2603.02019v1
- Date: Mon, 02 Mar 2026 16:02:34 GMT
- Title: Selection as Power: Constrained Reinforcement for Bounded Decision Authority
- Authors: Jose Manuel de la Chica Rodriguez, Juan Manuel Vera Díaz,
- Abstract summary: We introduce incentivized selection governance, where reinforcement updates are applied to scoring and reducer parameters under externally enforced sovereignty constraints.<n>We show that learning dynamics can coexist with structural diversity when sovereignty constraints are enforced at every update step.<n>These results demonstrate that learning dynamics can coexist with structural diversity when sovereignty constraints are enforced at every update step.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Selection as Power argued that upstream selection authority, rather than internal objective misalignment, constitutes a primary source of risk in high-stakes agentic systems. However, the original framework was static: governance constraints bounded selection power but did not adapt over time. In this work, we extend the framework to dynamic settings by introducing incentivized selection governance, where reinforcement updates are applied to scoring and reducer parameters under externally enforced sovereignty constraints. We formalize selection as a constrained reinforcement process in which parameter updates are projected onto governance-defined feasible sets, preventing concentration beyond prescribed bounds. Across multiple regulated financial scenarios, unconstrained reinforcement consistently collapses into deterministic dominance under repeated feedback, especially at higher learning rates. In contrast, incentivized governance enables adaptive improvement while maintaining bounded selection concentration. Projection-based constraints transform reinforcement from irreversible lock-in into controlled adaptation, with governance debt quantifying the tension between optimization pressure and authority bounds. These results demonstrate that learning dynamics can coexist with structural diversity when sovereignty constraints are enforced at every update step, offering a principled approach to integrating reinforcement into high-stakes agentic systems without surrendering bounded selection authority.
Related papers
- Operationalizing Fairness: Post-Hoc Threshold Optimization Under Hard Resource Limits [0.0]
The deployment of machine learning in high-stakes domains requires a balance between predictive safety and algorithmic fairness.<n>We introduce a post-hoc, model-agnostic threshold optimization framework that jointly safety balances, efficiency, and equity under strict and hard capacity constraints.
arXiv Detail & Related papers (2026-02-26T02:56:36Z) - Towards Selection as Power: Bounding Decision Authority in Autonomous Agents [0.0]
We propose a governance architecture that separates cognition, selection, and action into distinct domains and models autonomy as a vector of sovereignty.<n>We evaluate the system across multiple regulated financial scenarios under adversarial stress targeting variance manipulation, threshold gaming, framing skew, ordering effects, and entropy probing.<n>Results show that mechanical selection governance is implementable, auditable, and prevents deterministic outcome capture while preserving reasoning capacity.
arXiv Detail & Related papers (2026-02-16T10:10:47Z) - Constrained Group Relative Policy Optimization [18.3888203751956]
We introduce Constrained GRPO, a Lagrangian-based extension of GRPO for constrained policy optimization.<n>We show that a naive multi-component treatment in advantage estimation can break constrained learning.<n>We also evaluate Constrained GRPO on robotics tasks, where it improves constraint satisfaction while increasing task success.
arXiv Detail & Related papers (2026-02-05T16:44:23Z) - MAESTRO: Meta-learning Adaptive Estimation of Scalarization Trade-offs for Reward Optimization [56.074760766965085]
Group-Relative Policy Optimization has emerged as an efficient paradigm for aligning Large Language Models (LLMs)<n>We propose MAESTRO, which treats reward scalarization as a dynamic latent policy, leveraging the model's terminal hidden states as a semantic bottleneck.<n>We formulate this as a contextual bandit problem within a bi-level optimization framework, where a lightweight Conductor network co-evolves with the policy by utilizing group-relative advantages as a meta-reward signal.
arXiv Detail & Related papers (2026-01-12T05:02:48Z) - Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning [52.03884701766989]
offline reinforcement learning (RL) algorithms typically impose constraints on action selection.<n>We propose a new neighborhood constraint that restricts action selection in the Bellman target to the union of neighborhoods of dataset actions.<n>We develop a simple yet effective algorithm, Adaptive Neighborhood-constrained Q learning (ANQ), to perform Q learning with target actions satisfying this constraint.
arXiv Detail & Related papers (2025-11-04T13:42:05Z) - Overfitting in Adaptive Robust Optimization [4.66948282422762]
We propose assigning constraint-specific uncertainty set sizes, with harder constraints given stronger probabilistic guarantees.<n>This view motivates a principled approach to designing uncertainty sets that balances robustness and adaptivity.
arXiv Detail & Related papers (2025-09-19T22:09:51Z) - Incentivizing Safer Actions in Policy Optimization for Constrained Reinforcement Learning [9.62939764063531]
Constrained Reinforcement Learning aims to maximize the return while adhering to predefined constraint limits.<n>In continuous control settings, balancing the trade-off between reward and constraint satisfaction remains a significant challenge.<n>We introduce a novel approach that integrates an adaptive incentive mechanism in addition to the reward structure to stay within the constraint bound.
arXiv Detail & Related papers (2025-09-11T07:33:35Z) - Exterior Penalty Policy Optimization with Penalty Metric Network under Constraints [52.37099916582462]
In Constrained Reinforcement Learning (CRL), agents explore the environment to learn the optimal policy while satisfying constraints.
We propose a theoretically guaranteed penalty function method, Exterior Penalty Policy Optimization (EPO), with adaptive penalties generated by a Penalty Metric Network (PMN)
PMN responds appropriately to varying degrees of constraint violations, enabling efficient constraint satisfaction and safe exploration.
arXiv Detail & Related papers (2024-07-22T10:57:32Z) - Resilient Constrained Reinforcement Learning [87.4374430686956]
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before study.
It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward training objective and the constraint satisfaction.
We propose a new constrained RL approach that searches for policy and constraint specifications together.
arXiv Detail & Related papers (2023-12-28T18:28:23Z) - Trust-Region-Free Policy Optimization for Stochastic Policies [60.52463923712565]
We show that the trust region constraint over policies can be safely substituted by a trust-region-free constraint without compromising the underlying monotonic improvement guarantee.
We call the resulting algorithm Trust-REgion-Free Policy Optimization (TREFree) explicit as it is free of any trust region constraints.
arXiv Detail & Related papers (2023-02-15T23:10:06Z) - Monotonic Improvement Guarantees under Non-stationarity for
Decentralized PPO [66.5384483339413]
We present a new monotonic improvement guarantee for optimizing decentralized policies in cooperative Multi-Agent Reinforcement Learning (MARL)
We show that a trust region constraint can be effectively enforced in a principled way by bounding independent ratios based on the number of agents in training.
arXiv Detail & Related papers (2022-01-31T20:39:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.