Related papers: The AI Security Zugzwang

Related papers

Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5 [61.787178868669265]
This technical report presents an updated and granular assessment of five critical dimensions: cyber offense, persuasion and manipulation, strategic deception, uncontrolled AI R&D, and self-replication.<n>This work reflects our current understanding of AI frontier risks and urges collective action to mitigate these challenges.
arXiv Detail & Related papers (2026-02-16T04:30:06Z)
Responsible AI: The Good, The Bad, The AI [1.932555230783329]
This paper presents a comprehensive examination of AI's dual nature through the lens of strategic information systems.<n>We develop the Paradox-based Responsible AI Governance (PRAIG) framework that articulates: (1) the strategic benefits of AI adoption, (2) the inherent risks and unintended consequences, and (3) governance mechanisms that enable organizations to navigate these tensions.<n>The paper concludes with a research agenda for advancing responsible AI governance scholarship.
arXiv Detail & Related papers (2026-01-28T22:33:27Z)
AI Deception: Risks, Dynamics, and Controls [153.71048309527225]
This project provides a comprehensive and up-to-date overview of the AI deception field.<n>We identify a formal definition of AI deception, grounded in signaling theory from studies of animal deception.<n>We organize the landscape of AI deception research as a deception cycle, consisting of two key components: deception emergence and deception treatment.
arXiv Detail & Related papers (2025-11-27T16:56:04Z)
Governable AI: Provable Safety Under Extreme Threat Models [31.36879992618843]
We propose a Governable AI (GAI) framework that shifts from traditional internal constraints to externally enforced structural compliance.<n>The GAI framework is composed of a simple yet reliable, fully deterministic, powerful, flexible, and general-purpose rule enforcement module (REM); governance rules; and a governable secure super-platform (GSSP) that offers end-to-end protection against compromise or subversion by AI.
arXiv Detail & Related papers (2025-08-28T04:22:59Z)
Never Compromise to Vulnerabilities: A Comprehensive Survey on AI Governance [211.5823259429128]
We propose a comprehensive framework integrating technical and societal dimensions, structured around three interconnected pillars: Intrinsic Security, Derivative Security, and Social Ethics.<n>We identify three core challenges: (1) the generalization gap, where defenses fail against evolving threats; (2) inadequate evaluation protocols that overlook real-world risks; and (3) fragmented regulations leading to inconsistent oversight.<n>Our framework offers actionable guidance for researchers, engineers, and policymakers to develop AI systems that are not only robust and secure but also ethically aligned and publicly trustworthy.
arXiv Detail & Related papers (2025-08-12T09:42:56Z)
Manipulation Attacks by Misaligned AI: Risk Analysis and Safety Case Framework [0.0]
Humans are often the weakest link in cybersecurity systems.<n>A misaligned AI system may seek to undermine human oversight by manipulating employees.<n>No systematic framework exists for assessing and mitigating these risks.<n>This paper provides the first systematic methodology for integrating manipulation risk into AI safety governance.
arXiv Detail & Related papers (2025-07-17T07:45:53Z)
AI Governance to Avoid Extinction: The Strategic Landscape and Actionable Research Questions [2.07180164747172]
Humanity appears to be on course to soon develop AI systems that substantially outperform human experts.<n>We believe the default trajectory has a high likelihood of catastrophe, including human extinction.<n>Risks come from failure to control powerful AI systems, misuse of AI by malicious rogue actors, war between great powers, and authoritarian lock-in.
arXiv Detail & Related papers (2025-05-07T17:35:36Z)
AI threats to national security can be countered through an incident regime [55.2480439325792]
We propose a legally mandated post-deployment AI incident regime that aims to counter potential national security threats from AI systems. Our proposed AI incident regime is split into three phases. The first phase revolves around a novel operationalization of what counts as an 'AI incident' The second and third phases spell out that AI providers should notify a government agency about incidents, and that the government agency should be involved in amending AI providers' security and safety procedures.
arXiv Detail & Related papers (2025-03-25T17:51:50Z)
Real AI Agents with Fake Memories: Fatal Context Manipulation Attacks on Web3 Agents [36.49717045080722]
This paper investigates the vulnerabilities of AI agents within blockchain-based financial ecosystems when exposed to adversarial threats in real-world scenarios. We introduce the concept of context manipulation, a comprehensive attack vector that exploits unprotected context surfaces. To quantify these vulnerabilities, we design CrAIBench, a Web3 domain-specific benchmark that evaluates the robustness of AI agents against context manipulation attacks.
arXiv Detail & Related papers (2025-03-20T15:44:31Z)
Transforming Cyber Defense: Harnessing Agentic and Frontier AI for Proactive, Ethical Threat Intelligence [0.0]
This manuscript explores how the convergence of agentic AI and Frontier AI is transforming cybersecurity. We examine the roles of real time monitoring, automated incident response, and perpetual learning in forging a resilient, dynamic defense ecosystem. Our vision is to harmonize technological innovation with unwavering ethical oversight, ensuring that future AI driven security solutions uphold core human values of fairness, transparency, and accountability while effectively countering emerging cyber threats.
arXiv Detail & Related papers (2025-02-28T20:23:35Z)
Position: A taxonomy for reporting and describing AI security incidents [57.98317583163334]
We argue that specific are required to describe and report security incidents of AI systems. Existing frameworks for either non-AI security or generic AI safety incident reporting are insufficient to capture the specific properties of AI security.
arXiv Detail & Related papers (2024-12-19T13:50:26Z)
Considerations Influencing Offense-Defense Dynamics From Artificial Intelligence [0.0]
AI can enhance defensive capabilities but also presents avenues for malicious exploitation and large-scale societal harm.<n>This paper proposes a taxonomy to map and examine the key factors that influence whether AI systems predominantly pose threats or offer protective benefits to society.
arXiv Detail & Related papers (2024-12-05T10:05:53Z)
Imagining and building wise machines: The centrality of AI metacognition [78.76893632793497]
We argue that shortcomings stem from one overarching failure: AI systems lack wisdom. While AI research has focused on task-level strategies, metacognition is underdeveloped in AI systems. We propose that integrating metacognitive capabilities into AI systems is crucial for enhancing their robustness, explainability, cooperation, and safety.
arXiv Detail & Related papers (2024-11-04T18:10:10Z)
Combining AI Control Systems and Human Decision Support via Robustness and Criticality [53.10194953873209]
We extend a methodology for adversarial explanations (AE) to state-of-the-art reinforcement learning frameworks. We show that the learned AI control system demonstrates robustness against adversarial tampering. In a training / learning framework, this technology can improve both the AI's decisions and explanations through human interaction.
arXiv Detail & Related papers (2024-07-03T15:38:57Z)
AI Safety: A Climb To Armageddon? [0.0]
The paper examines three response strategies: Optimism, Mitigation, and Holism. The surprising robustness of the argument forces a re-examination of core assumptions around AI safety.
arXiv Detail & Related papers (2024-05-30T08:41:54Z)
Artificial Intelligence as the New Hacker: Developing Agents for Offensive Security [0.0]
This paper explores the integration of Artificial Intelligence (AI) into offensive cybersecurity. It develops an autonomous AI agent, ReaperAI, designed to simulate and execute cyberattacks. ReaperAI demonstrates the potential to identify, exploit, and analyze security vulnerabilities autonomously.
arXiv Detail & Related papers (2024-05-09T18:15:12Z)
Safety Cases: How to Justify the Safety of Advanced AI Systems [5.097102520834254]
As AI systems become more advanced, companies and regulators will make difficult decisions about whether it is safe to train and deploy them. We propose a framework for organizing a safety case and discuss four categories of arguments to justify safety. We evaluate concrete examples of arguments in each category and outline how arguments could be combined to justify that AI systems are safe to deploy.
arXiv Detail & Related papers (2024-03-15T16:53:13Z)
Managing extreme AI risks amid rapid progress [171.05448842016125]
We describe risks that include large-scale social harms, malicious uses, and irreversible loss of human control over autonomous AI systems. There is a lack of consensus about how exactly such risks arise, and how to manage them. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness, and barely address autonomous systems.
arXiv Detail & Related papers (2023-10-26T17:59:06Z)
Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the Age of AI-NIDS [70.60975663021952]
We study blackbox adversarial attacks on network classifiers. We argue that attacker-defender fixed points are themselves general-sum games with complex phase transitions. We show that a continual learning approach is required to study attacker-defender dynamics.
arXiv Detail & Related papers (2021-11-23T23:42:16Z)
The Feasibility and Inevitability of Stealth Attacks [63.14766152741211]
We study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence systems. In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself.
arXiv Detail & Related papers (2021-06-26T10:50:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.