Regulating the Agency of LLM-based Agents
- URL: http://arxiv.org/abs/2509.22735v1
- Date: Thu, 25 Sep 2025 20:14:02 GMT
- Title: Regulating the Agency of LLM-based Agents
- Authors: Seán Boddy, Joshua Joseph,
- Abstract summary: We propose an approach that directly measures and controls the agency of AI systems.<n>We conceptualize the agency of LLM-based agents as a property independent of intelligence-related measures.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As increasingly capable large language model (LLM)-based agents are developed, the potential harms caused by misalignment and loss of control grow correspondingly severe. To address these risks, we propose an approach that directly measures and controls the agency of these AI systems. We conceptualize the agency of LLM-based agents as a property independent of intelligence-related measures and consistent with the interdisciplinary literature on the concept of agency. We offer (1) agency as a system property operationalized along the dimensions of preference rigidity, independent operation, and goal persistence, (2) a representation engineering approach to the measurement and control of the agency of an LLM-based agent, and (3) regulatory tools enabled by this approach: mandated testing protocols, domain-specific agency limits, insurance frameworks that price risk based on agency, and agency ceilings to prevent societal-scale risks. We view our approach as a step toward reducing the risks that motivate the ``Scientist AI'' paradigm, while still capturing some of the benefits from limited agentic behavior.
Related papers
- Towards Verifiably Safe Tool Use for LLM Agents [53.55621104327779]
Large language model (LLM)-based AI agents extend capabilities by enabling access to tools such as data sources, APIs, search engines, code sandboxes, and even other agents.<n>LLMs may invoke unintended tool interactions and introduce risks, such as leaking sensitive data or overwriting critical records.<n>Current approaches to mitigate these risks, such as model-based safeguards, enhance agents' reliability but cannot guarantee system safety.
arXiv Detail & Related papers (2026-01-12T21:31:38Z) - AURA: An Agent Autonomy Risk Assessment Framework [0.0]
AURA (Agent aUtonomy Risk Assessment) is a unified framework designed to detect, quantify, and mitigate risks arising from agentic AI.<n>AURA provides an interactive process to score, evaluate and mitigate the risks of running one or multiple AI Agents, synchronously or asynchronously.<n>AURA supports a responsible and transparent adoption of agentic AI and provides robust risk detection and mitigation while balancing computational resources.
arXiv Detail & Related papers (2025-10-17T15:30:29Z) - Risk Analysis Techniques for Governed LLM-based Multi-Agent Systems [0.0]
This report addresses the early stages of risk identification and analysis for multi-agent AI systems.<n>We examine six critical failure modes: cascading reliability failures, inter-agent communication failures, monoculture collapse, conformity bias, deficient theory of mind, and mixed motive dynamics.
arXiv Detail & Related papers (2025-08-06T06:06:57Z) - SafeMobile: Chain-level Jailbreak Detection and Automated Evaluation for Multimodal Mobile Agents [58.21223208538351]
This work explores the security issues surrounding mobile multimodal agents.<n>It attempts to construct a risk discrimination mechanism by incorporating behavioral sequence information.<n>It also designs an automated assisted assessment scheme based on a large language model.
arXiv Detail & Related papers (2025-07-01T15:10:00Z) - TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems [8.683314804719506]
This review presents a structured analysis of Trust, Risk, and Security Management (TRiSM) in the context of Agentic Multi-Agent Systems (AMAS)<n>We begin by examining the conceptual foundations of Agentic AI and highlight its architectural distinctions from traditional AI agents.<n>We then adapt and extend the AI TRiSM framework for Agentic AI, structured around key pillars: textit Explainability, ModelOps, Security, Privacy and textittheir lifecycle governance<n>A risk taxonomy is proposed to capture the unique threats and vulnerabilities of Agentic AI, ranging from coordination failures to
arXiv Detail & Related papers (2025-06-04T16:26:11Z) - LLM Agents Should Employ Security Principles [60.03651084139836]
This paper argues that the well-established design principles in information security should be employed when deploying Large Language Model (LLM) agents at scale.<n>We introduce AgentSandbox, a conceptual framework embedding these security principles to provide safeguards throughout an agent's life-cycle.
arXiv Detail & Related papers (2025-05-29T21:39:08Z) - Do LLMs trust AI regulation? Emerging behaviour of game-theoretic LLM agents [61.132523071109354]
This paper investigates the interplay between AI developers, regulators and users, modelling their strategic choices under different regulatory scenarios.<n>Our research identifies emerging behaviours of strategic AI agents, which tend to adopt more "pessimistic" stances than pure game-theoretic agents.
arXiv Detail & Related papers (2025-04-11T15:41:21Z) - Inherent and emergent liability issues in LLM-based agentic systems: a principal-agent perspective [0.0]
Agentic systems powered by large language models (LLMs) are becoming progressively more complex and capable.<n>Their increasing agency and expanding deployment settings attract growing attention to effective governance policies, monitoring, and control protocols.<n>We analyze potential liability issues arising from the delegated use of LLM agents and their extended systems through a principal-agent perspective.
arXiv Detail & Related papers (2025-04-04T08:10:02Z) - AI Agents Should be Regulated Based on the Extent of Their Autonomous Operations [8.043534206868326]
We argue that AI agents should be regulated by the extent to which they operate autonomously.<n>Existing regulations often focus on computational scale as a proxy for potential harm.<n>We discuss relevant regulations and recommendations from scientists regarding existential risks.
arXiv Detail & Related papers (2025-02-07T09:40:48Z) - Agent-as-a-Judge: Evaluate Agents with Agents [61.33974108405561]
We introduce the Agent-as-a-Judge framework, wherein agentic systems are used to evaluate agentic systems.
This is an organic extension of the LLM-as-a-Judge framework, incorporating agentic features that enable intermediate feedback for the entire task-solving process.
We present DevAI, a new benchmark of 55 realistic automated AI development tasks.
arXiv Detail & Related papers (2024-10-14T17:57:02Z) - Risks of AI Scientists: Prioritizing Safeguarding Over Autonomy [65.77763092833348]
This perspective examines vulnerabilities in AI scientists, shedding light on potential risks associated with their misuse.<n>We take into account user intent, the specific scientific domain, and their potential impact on the external environment.<n>We propose a triadic framework involving human regulation, agent alignment, and an understanding of environmental feedback.
arXiv Detail & Related papers (2024-02-06T18:54:07Z) - TrustAgent: Towards Safe and Trustworthy LLM-based Agents [50.33549510615024]
This paper presents an Agent-Constitution-based agent framework, TrustAgent, with a focus on improving the LLM-based agent safety.
The proposed framework ensures strict adherence to the Agent Constitution through three strategic components: pre-planning strategy which injects safety knowledge to the model before plan generation, in-planning strategy which enhances safety during plan generation, and post-planning strategy which ensures safety by post-planning inspection.
arXiv Detail & Related papers (2024-02-02T17:26:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.