Related papers: Position: AI agents should be regulated based on autonomous action sequences

Related papers

SafeMobile: Chain-level Jailbreak Detection and Automated Evaluation for Multimodal Mobile Agents [58.21223208538351]
This work explores the security issues surrounding mobile multimodal agents.<n>It attempts to construct a risk discrimination mechanism by incorporating behavioral sequence information.<n>It also designs an automated assisted assessment scheme based on a large language model.
arXiv Detail & Related papers (2025-07-01T15:10:00Z)
Emergent Risk Awareness in Rational Agents under Resource Constraints [2.3013689524682976]
This work aims to increase understanding and interpretability of emergent behaviours of AI agents operating under survival pressure.<n>We provide theoretical and empirical results that quantify the impact of survival-driven preference shifts.<n>We propose mechanisms to mitigate the emergence of risk-seeking or risk-averse behaviours.
arXiv Detail & Related papers (2025-05-29T13:31:12Z)
An Approach to Technical AGI Safety and Security [72.83728459135101]
We develop an approach to address the risk of harms consequential enough to significantly harm humanity. We focus on technical approaches to misuse and misalignment. We briefly outline how these ingredients could be combined to produce safety cases for AGI systems.
arXiv Detail & Related papers (2025-04-02T15:59:31Z)
Confronting Catastrophic Risk: The International Obligation to Regulate Artificial Intelligence [0.0]
We argue that there exists an international obligation to mitigate the threat of human extinction by AI. We argue that there is a positive obligation on states under the right to life within international human rights law to proactively take regulatory action to mitigate the potential existential risk of AI.
arXiv Detail & Related papers (2025-03-23T06:24:45Z)
Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path? [37.13209023718946]
Unchecked AI agency poses significant risks to public safety and security. We discuss how these risks arise from current AI training methods. We propose a core building block for further advances the development of a non-agentic AI system.
arXiv Detail & Related papers (2025-02-21T18:28:36Z)
Nuclear Deployed: Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents [10.565508277042564]
Large language models (LLMs) are evolving into autonomous decision-makers, raising concerns about catastrophic risks in high-stakes scenarios.<n>Based on the insight that such risks can originate from trade-offs between the agent's Helpful, Harmlessness and Honest (HHH) goals, we build a novel three-stage evaluation framework.<n>We conduct 14,400 agentic simulations across 12 advanced LLMs, with extensive experiments and analysis.
arXiv Detail & Related papers (2025-02-17T02:11:17Z)
Fully Autonomous AI Agents Should Not be Developed [58.88624302082713]
This paper argues that fully autonomous AI agents should not be developed.<n>In support of this position, we build from prior scientific literature and current product marketing to delineate different AI agent levels.<n>Our analysis reveals that risks to people increase with the autonomy of a system.
arXiv Detail & Related papers (2025-02-04T19:00:06Z)
Engineering Trustworthy AI: A Developer Guide for Empirical Risk Minimization [53.80919781981027]
Key requirements for trustworthy AI can be translated into design choices for the components of empirical risk minimization. We hope to provide actionable guidance for building AI systems that meet emerging standards for trustworthiness of AI.
arXiv Detail & Related papers (2024-10-25T07:53:32Z)
Risk Alignment in Agentic AI Systems [0.0]
Agentic AIs capable of undertaking complex actions with little supervision raise new questions about how to safely create and align such systems with users, developers, and society. Risk alignment will matter for user satisfaction and trust, but it will also have important ramifications for society more broadly. We present three papers that bear on key normative and technical aspects of these questions.
arXiv Detail & Related papers (2024-10-02T18:21:08Z)
Criticality and Safety Margins for Reinforcement Learning [53.10194953873209]
We seek to define a criticality framework with both a quantifiable ground truth and a clear significance to users.<n>We introduce true criticality as the expected drop in reward when an agent deviates from its policy for n consecutive random actions.<n>We also introduce the concept of proxy criticality, a low-overhead metric that has a statistically monotonic relationship to true criticality.
arXiv Detail & Related papers (2024-09-26T21:00:45Z)
Liability and Insurance for Catastrophic Losses: the Nuclear Power Precedent and Lessons for AI [0.0]
This paper argues that developers of frontier AI models should be assigned limited, strict, and exclusive third party liability for harms resulting from Critical AI Occurrences (CAIOs) Mandatory insurance for CAIO liability is recommended to overcome developers' judgment-proofness, winner's curse dynamics, and leverage insurers' quasi-regulatory abilities.
arXiv Detail & Related papers (2024-09-10T17:41:31Z)
Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science [65.77763092833348]
Intelligent agents powered by large language models (LLMs) have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines. While their capabilities are promising, these agents also introduce novel vulnerabilities that demand careful consideration for safety. This paper conducts a thorough examination of vulnerabilities in LLM-based agents within scientific domains, shedding light on potential risks associated with their misuse and emphasizing the need for safety measures.
arXiv Detail & Related papers (2024-02-06T18:54:07Z)
Visibility into AI Agents [9.067567737098594]
Increased delegation of commercial, scientific, governmental, and personal activities to AI agents may exacerbate existing societal risks. We assess three categories of measures to increase visibility into AI agents: agent identifiers, real-time monitoring, and activity logging.
arXiv Detail & Related papers (2024-01-23T23:18:33Z)
Managing extreme AI risks amid rapid progress [171.05448842016125]
We describe risks that include large-scale social harms, malicious uses, and irreversible loss of human control over autonomous AI systems. There is a lack of consensus about how exactly such risks arise, and how to manage them. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness, and barely address autonomous systems.
arXiv Detail & Related papers (2023-10-26T17:59:06Z)
Safety Margins for Reinforcement Learning [53.10194953873209]
We show how to leverage proxy criticality metrics to generate safety margins. We evaluate our approach on learned policies from APE-X and A3C within an Atari environment.
arXiv Detail & Related papers (2023-07-25T16:49:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.