Autonomous Agents and Policy Compliance: A Framework for Reasoning About Penalties
- URL: http://arxiv.org/abs/2512.03931v1
- Date: Wed, 03 Dec 2025 16:29:09 GMT
- Title: Autonomous Agents and Policy Compliance: A Framework for Reasoning About Penalties
- Authors: Vineel Tummala, Daniela Inclezan,
- Abstract summary: This paper presents a logic programming-based framework for policy-aware autonomous agents that can reason about potential penalties for non-compliance.<n>Our framework extends Gelfond and Lobo's Authorization and Obligation Policy Language (AOPL) to incorporate penalties.<n>Our method ensures well-formed policies, accounts for policy priorities, and enhances explainability by explicitly identifying rule violations.
- Score: 1.2891210250935148
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a logic programming-based framework for policy-aware autonomous agents that can reason about potential penalties for non-compliance and act accordingly. While prior work has primarily focused on ensuring compliance, our approach considers scenarios where deviating from policies may be necessary to achieve high-stakes goals. Additionally, modeling non-compliant behavior can assist policymakers by simulating realistic human decision-making. Our framework extends Gelfond and Lobo's Authorization and Obligation Policy Language (AOPL) to incorporate penalties and integrates Answer Set Programming (ASP) for reasoning. Compared to previous approaches, our method ensures well-formed policies, accounts for policy priorities, and enhances explainability by explicitly identifying rule violations and their consequences. Building on the work of Harders and Inclezan, we introduce penalty-based reasoning to distinguish between non-compliant plans, prioritizing those with minimal repercussions. To support this, we develop an automated translation from the extended AOPL into ASP and refine ASP-based planning algorithms to account for incurred penalties. Experiments in two domains demonstrate that our framework generates higher-quality plans that avoid harmful actions while, in some cases, also improving computational efficiency. These findings underscore its potential for enhancing autonomous decision-making and informing policy refinement. Under consideration in Theory and Practice of Logic Programming (TPLP).
Related papers
- Constrained and Robust Policy Synthesis with Satisfiability-Modulo-Probabilistic-Model-Checking [4.064849471241967]
This paper contributes the first approach to effectively compute robust policies subject to arbitrary structural constraints.<n> Experiments on a few hundred benchmarks demonstrate the feasibility for constrained and robust policy synthesis.
arXiv Detail & Related papers (2025-11-11T10:28:42Z) - Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning [77.92320830700797]
Reinforcement Learning has played a central role in enabling reasoning capabilities of Large Language Models.<n>We propose a tractable computational framework that tracks and leverages curvature information during policy updates.<n>The algorithm, Curvature-Aware Policy Optimization (CAPO), identifies samples that contribute to unstable updates and masks them out.
arXiv Detail & Related papers (2025-10-01T12:29:32Z) - Scaling Policy Compliance Assessment in Language Models with Policy Reasoning Traces [12.671657542087624]
Policy Reasoning Traces (PRT) is a form of specialized generated reasoning chains that serve as a reasoning bridge to improve an LLM's policy compliance assessment capabilities.<n>Our empirical evaluations demonstrate that the use of PRTs for both inference-time and training-time scenarios significantly enhances the performance of open-weight and commercial models.
arXiv Detail & Related papers (2025-09-27T13:10:21Z) - Proactive Constrained Policy Optimization with Preemptive Penalty [11.93135424276656]
We propose a novel preemptive penalty mechanism for constrained policy optimization.<n>This mechanism integrates barrier items into the objective function as the policy nears the boundary, imposing a cost.<n>We also introduce a constraint-aware intrinsic reward to guide boundary-aware exploration, which is activated only when the policy approaches the constraint boundary.
arXiv Detail & Related papers (2025-08-03T18:35:55Z) - Learning Deterministic Policies with Policy Gradients in Constrained Markov Decision Processes [59.27926064817273]
We introduce an exploration-agnostic algorithm, called C-PG, which enjoys global last-iterate convergence guarantees under domination assumptions.<n>We empirically validate both the action-based (C-PGAE) and parameter-based (C-PGPE) variants of C-PG on constrained control tasks.
arXiv Detail & Related papers (2025-06-06T10:29:05Z) - Efficient Action-Constrained Reinforcement Learning via Acceptance-Rejection Method and Augmented MDPs [13.443196224057658]
Action-constrained reinforcement learning (ACRL) is a generic framework for learning control policies with zero action constraint violation.<n>We propose a generic and computationally efficient framework that can adapt a standard unconstrained RL method to ACRL.<n>We show that the proposed framework enjoys faster training progress, better constraint satisfaction, and a lower action inference time simultaneously than the state-of-the-art ACRL methods.
arXiv Detail & Related papers (2025-03-17T08:41:43Z) - Exterior Penalty Policy Optimization with Penalty Metric Network under Constraints [52.37099916582462]
In Constrained Reinforcement Learning (CRL), agents explore the environment to learn the optimal policy while satisfying constraints.
We propose a theoretically guaranteed penalty function method, Exterior Penalty Policy Optimization (EPO), with adaptive penalties generated by a Penalty Metric Network (PMN)
PMN responds appropriately to varying degrees of constraint violations, enabling efficient constraint satisfaction and safe exploration.
arXiv Detail & Related papers (2024-07-22T10:57:32Z) - Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions.
We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z) - Penalization Framework For Autonomous Agents Using Answer Set
Programming [0.0]
This paper presents a framework for enforcing penalties on intelligent agents that do not comply with authorization or obligation policies in a changing environment.
A framework is proposed to represent and reason about penalties in plans, and an algorithm is proposed to penalize an agent's actions based on their level of compliance with respect to authorization and obligation policies.
arXiv Detail & Related papers (2023-08-30T09:09:27Z) - PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback [106.63518036538163]
We present a novel unified bilevel optimization-based framework, textsfPARL, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning.
Our framework addressed these concerns by explicitly parameterizing the distribution of the upper alignment objective (reward design) by the lower optimal variable.
Our empirical results substantiate that the proposed textsfPARL can address the alignment concerns in RL by showing significant improvements.
arXiv Detail & Related papers (2023-08-03T18:03:44Z) - Theoretically Guaranteed Policy Improvement Distilled from Model-Based
Planning [64.10794426777493]
Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks.
Recent practices tend to distill optimized action sequences into an RL policy during the training phase.
We develop an approach to distill from model-based planning to the policy.
arXiv Detail & Related papers (2023-07-24T16:52:31Z) - Building a Foundation for Data-Driven, Interpretable, and Robust Policy
Design using the AI Economist [67.08543240320756]
We show that the AI Economist framework enables effective, flexible, and interpretable policy design using two-level reinforcement learning and data-driven simulations.
We find that log-linear policies trained using RL significantly improve social welfare, based on both public health and economic outcomes, compared to past outcomes.
arXiv Detail & Related papers (2021-08-06T01:30:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.