Related papers: Risk Alignment in Agentic AI Systems

Risk Alignment in Agentic AI Systems

URL: http://arxiv.org/abs/2410.01927v1
Date: Wed, 2 Oct 2024 18:21:08 GMT
Title: Risk Alignment in Agentic AI Systems
Authors: Hayley Clatterbuck, Clinton Castro, Arvo Muñoz Morán,
Abstract summary: Agentic AIs capable of undertaking complex actions with little supervision raise new questions about how to safely create and align such systems with users, developers, and society. Risk alignment will matter for user satisfaction and trust, but it will also have important ramifications for society more broadly. We present three papers that bear on key normative and technical aspects of these questions.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Agentic AIs $-$ AIs that are capable and permitted to undertake complex actions with little supervision $-$ mark a new frontier in AI capabilities and raise new questions about how to safely create and align such systems with users, developers, and society. Because agents' actions are influenced by their attitudes toward risk, one key aspect of alignment concerns the risk profiles of agentic AIs. Risk alignment will matter for user satisfaction and trust, but it will also have important ramifications for society more broadly, especially as agentic AIs become more autonomous and are allowed to control key aspects of our lives. AIs with reckless attitudes toward risk (either because they are calibrated to reckless human users or are poorly designed) may pose significant threats. They might also open 'responsibility gaps' in which there is no agent who can be held accountable for harmful actions. What risk attitudes should guide an agentic AI's decision-making? How might we design AI systems that are calibrated to the risk attitudes of their users? What guardrails, if any, should be placed on the range of permissible risk attitudes? What are the ethical considerations involved when designing systems that make risky decisions on behalf of others? We present three papers that bear on key normative and technical aspects of these questions.

Related papers

Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5 [61.787178868669265]
This technical report presents an updated and granular assessment of five critical dimensions: cyber offense, persuasion and manipulation, strategic deception, uncontrolled AI R&D, and self-replication.<n>This work reflects our current understanding of AI frontier risks and urges collective action to mitigate these challenges.
arXiv Detail & Related papers (2026-02-16T04:30:06Z)
"We are not Future-ready": Understanding AI Privacy Risks and Existing Mitigation Strategies from the Perspective of AI Developers in Europe [56.1653658714305]
We interviewed 25 AI developers based in Europe to understand which privacy threats they believe pose the greatest risk to users, developers, and businesses.<n>We find that there is little consensus among AI developers on the relative ranking of privacy risks.<n>While AI developers are aware of proposed mitigation strategies for addressing these risks, they reported minimal real-world adoption.
arXiv Detail & Related papers (2025-10-01T13:51:33Z)
An Artificial Intelligence Value at Risk Approach: Metrics and Models [0.0]
The state of the art of artificial intelligence risk management seems to be highly immature due to upcoming AI regulations.<n>The purpose of this paper is to orientate AI stakeholders about the depths of AI risk management.
arXiv Detail & Related papers (2025-09-22T20:27:29Z)
The AI Risk Spectrum: From Dangerous Capabilities to Existential Threats [0.0]
This paper maps the full spectrum of AI risks, from current harms affecting individual users to existential threats that could endanger humanity's survival.<n>We organize these risks into three main causal categories.<n>Our goal is to help readers understand the complete landscape of AI risks.
arXiv Detail & Related papers (2025-08-19T10:05:51Z)
When Autonomy Goes Rogue: Preparing for Risks of Multi-Agent Collusion in Social Systems [78.04679174291329]
We introduce a proof-of-concept to simulate the risks of malicious multi-agent systems (MAS)<n>We apply this framework to two high-risk fields: misinformation spread and e-commerce fraud.<n>Our findings show that decentralized systems are more effective at carrying out malicious actions than centralized ones.
arXiv Detail & Related papers (2025-07-19T15:17:30Z)
Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path? [37.13209023718946]
Unchecked AI agency poses significant risks to public safety and security. We discuss how these risks arise from current AI training methods. We propose a core building block for further advances the development of a non-agentic AI system.
arXiv Detail & Related papers (2025-02-21T18:28:36Z)
Fully Autonomous AI Agents Should Not be Developed [58.88624302082713]
This paper argues that fully autonomous AI agents should not be developed. In support of this position, we build from prior scientific literature and current product marketing to delineate different AI agent levels. Our analysis reveals that risks to people increase with the autonomy of a system.
arXiv Detail & Related papers (2025-02-04T19:00:06Z)
Imagining and building wise machines: The centrality of AI metacognition [78.76893632793497]
We argue that shortcomings stem from one overarching failure: AI systems lack wisdom. While AI research has focused on task-level strategies, metacognition is underdeveloped in AI systems. We propose that integrating metacognitive capabilities into AI systems is crucial for enhancing their robustness, explainability, cooperation, and safety.
arXiv Detail & Related papers (2024-11-04T18:10:10Z)
Engineering Trustworthy AI: A Developer Guide for Empirical Risk Minimization [53.80919781981027]
Key requirements for trustworthy AI can be translated into design choices for the components of empirical risk minimization. We hope to provide actionable guidance for building AI systems that meet emerging standards for trustworthiness of AI.
arXiv Detail & Related papers (2024-10-25T07:53:32Z)
What's my role? Modelling responsibility for AI-based safety-critical systems [1.0549609328807565]
It is difficult for developers and manufacturers to be held responsible for harmful behaviour of an AI-SCS. A human operator can become a "liability sink" absorbing blame for the consequences of AI-SCS outputs they weren't responsible for creating. This paper considers different senses of responsibility (role, moral, legal and causal), and how they apply in the context of AI-SCS safety.
arXiv Detail & Related papers (2023-12-30T13:45:36Z)
Control Risk for Potential Misuse of Artificial Intelligence in Science [85.91232985405554]
We aim to raise awareness of the dangers of AI misuse in science. We highlight real-world examples of misuse in chemical science. We propose a system called SciGuard to control misuse risks for AI models in science.
arXiv Detail & Related papers (2023-12-11T18:50:57Z)
Managing extreme AI risks amid rapid progress [171.05448842016125]
We describe risks that include large-scale social harms, malicious uses, and irreversible loss of human control over autonomous AI systems. There is a lack of consensus about how exactly such risks arise, and how to manage them. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness, and barely address autonomous systems.
arXiv Detail & Related papers (2023-10-26T17:59:06Z)
The Promise and Peril of Artificial Intelligence -- Violet Teaming Offers a Balanced Path Forward [56.16884466478886]
This paper reviews emerging issues with opaque and uncontrollable AI systems. It proposes an integrative framework called violet teaming to develop reliable and responsible AI. It emerged from AI safety research to manage risks proactively by design.
arXiv Detail & Related papers (2023-08-28T02:10:38Z)
An Overview of Catastrophic AI Risks [38.84933208563934]
This paper provides an overview of the main sources of catastrophic AI risks, which we organize into four categories. Malicious use, in which individuals or groups intentionally use AIs to cause harm; AI race, in which competitive environments compel actors to deploy unsafe AIs or cede control to AIs. organizational risks, highlighting how human factors and complex systems can increase the chances of catastrophic accidents. rogue AIs, describing the inherent difficulty in controlling agents far more intelligent than humans.
arXiv Detail & Related papers (2023-06-21T03:35:06Z)
Three lines of defense against risks from AI [0.0]
It is not always clear who is responsible for AI risk management. The Three Lines of Defense (3LoD) model is considered best practice in many industries. I suggest ways in which AI companies could implement the model.
arXiv Detail & Related papers (2022-12-16T09:33:00Z)
A Brief Overview of AI Governance for Responsible Machine Learning Systems [3.222802562733787]
This position paper seeks to present a brief introduction to AI governance, which is a framework designed to oversee the responsible use of AI. Due to the probabilistic nature of AI, the risks associated with it are far greater than traditional technologies.
arXiv Detail & Related papers (2022-11-21T23:48:51Z)
Cybertrust: From Explainable to Actionable and Interpretable AI (AI2) [58.981120701284816]
Actionable and Interpretable AI (AI2) will incorporate explicit quantifications and visualizations of user confidence in AI recommendations. It will allow examining and testing of AI system predictions to establish a basis for trust in the systems' decision making.
arXiv Detail & Related papers (2022-01-26T18:53:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.