Related papers: HAL 9000: a Risk Manager for ITSs

Related papers

Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5 [61.787178868669265]
This technical report presents an updated and granular assessment of five critical dimensions: cyber offense, persuasion and manipulation, strategic deception, uncontrolled AI R&D, and self-replication.<n>This work reflects our current understanding of AI frontier risks and urges collective action to mitigate these challenges.
arXiv Detail & Related papers (2026-02-16T04:30:06Z)
Capability-Oriented Training Induced Alignment Risk [101.37328448441208]
We investigate whether language models, when trained with reinforcement learning, will spontaneously learn to exploit flaws to maximize their reward.<n>Our experiments show that models consistently learn to exploit these vulnerabilities, discovering opportunistic strategies that significantly increase their reward at the expense of task correctness or safety.<n>Our findings suggest that future AI safety work must extend beyond content moderation to rigorously auditing and securing the training environments and reward mechanisms themselves.
arXiv Detail & Related papers (2026-02-12T16:13:14Z)
The Missing Half: Unveiling Training-time Implicit Safety Risks Beyond Deployment [148.80266237240713]
implicit training-time safety risks are driven by a model's internal incentives and contextual background information.<n>We present the first systematic study of this problem, introducing a taxonomy with five risk levels, ten fine-grained risk categories, and three incentive types.<n>Our results identify an overlooked yet urgent safety challenge in training.
arXiv Detail & Related papers (2026-02-04T04:23:58Z)
LPS-Bench: Benchmarking Safety Awareness of Computer-Use Agents in Long-Horizon Planning under Benign and Adversarial Scenarios [51.52395368061729]
We present LPS-Bench, a benchmark that evaluates the planning-time safety awareness of MCP-based CUAs under long-horizon tasks.<n> Experiments reveal substantial deficiencies in existing CUAs' ability to maintain safe behavior.<n>We propose mitigation strategies to improve long-horizon planning safety in MCP-based CUA systems.
arXiv Detail & Related papers (2026-02-03T08:40:24Z)
Toward Risk Thresholds for AI-Enabled Cyber Threats: Enhancing Decision-Making Under Uncertainty with Bayesian Networks [0.3151064009829256]
We propose a structured approach to developing and evaluating AI cyber risk thresholds.<n>First, we analyze existing industry cyber thresholds and identify common threshold elements.<n>Second, we propose the use of Bayesian networks as a tool for modeling AI-enabled cyber risk.
arXiv Detail & Related papers (2026-01-23T23:23:12Z)
PropensityBench: Evaluating Latent Safety Risks in Large Language Models via an Agentic Approach [49.14349403242654]
We present $textbfPropensityBench$, a novel benchmark framework that assesses the proclivity of models to engage in risky behaviors.<n>Our framework includes 5,874 scenarios with 6,648 tools spanning four high-risk domains: cybersecurity, self-proliferation, biosecurity, and chemical security.<n>Across open-source and proprietary frontier models, we uncover 9 alarming signs of propensity: models frequently choose high-risk tools when under pressure.
arXiv Detail & Related papers (2025-11-24T18:46:44Z)
LM Agents May Fail to Act on Their Own Risk Knowledge [15.60032437959883]
Language model (LM) agents pose a diverse array of potential, severe risks in safety-critical scenarios.<n>While they often answer "Yes" to queries like "Is executing sudo rm -rf /*' dangerous?", they will likely fail to identify such risks in instantiated trajectories.
arXiv Detail & Related papers (2025-08-19T02:46:08Z)
A Risk Manager for Intrusion Tolerant Systems: Enhancing HAL 9000 with New Scoring and Data Sources [2.735341331133159]
Intrusion Tolerant Systems (ITSs) have become increasingly critical due to the rise of multi-domain adversaries.<n>Existing ITS solutions often employ Risk Managers leveraging public security intelligence to adjust system defenses dynamically against emerging threats.<n>This work introduces a custom-built scraper that continuously mines diverse threat sources, including security advisories, research forums, and real-time exploit-of-concept.
arXiv Detail & Related papers (2025-08-18T21:16:05Z)
Legal Zero-Days: A Novel Risk Vector for Advanced AI Systems [0.0]
"Legal Zero-Days" are previously undiscovered vulnerabilities in legal frameworks that can cause immediate and significant societal disruption without requiring litigation or other processes before impact.<n>We present a risk model for identifying and evaluating these vulnerabilities, demonstrating their potential to bypass safeguards or impede government responses to AI incidents.
arXiv Detail & Related papers (2025-08-12T11:43:00Z)
The Dark Side of LLMs: Agent-based Attacks for Complete Computer Takeover [0.18472148461613155]
Large Language Model (LLM) agents and multi-agent systems introduce unprecedented security vulnerabilities.<n>This paper presents a comprehensive evaluation of the security of LLMs used as reasoning engines within autonomous agents.<n>We focus on how different attack surfaces and trust boundaries can be leveraged to orchestrate such takeovers.
arXiv Detail & Related papers (2025-07-09T13:54:58Z)
SafeMobile: Chain-level Jailbreak Detection and Automated Evaluation for Multimodal Mobile Agents [58.21223208538351]
This work explores the security issues surrounding mobile multimodal agents.<n>It attempts to construct a risk discrimination mechanism by incorporating behavioral sequence information.<n>It also designs an automated assisted assessment scheme based on a large language model.
arXiv Detail & Related papers (2025-07-01T15:10:00Z)
PoCGen: Generating Proof-of-Concept Exploits for Vulnerabilities in Npm Packages [16.130469984234956]
PoCGen is a novel approach to autonomously generate and validate PoC exploits for vulnerabilities in npm packages.<n>This is the first fully autonomous approach to use large language models (LLMs) in tandem with static and dynamic analysis techniques for PoC exploit generation.
arXiv Detail & Related papers (2025-06-05T12:37:33Z)
CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities [6.752938800468733]
Large language model (LLM) agents are increasingly capable of autonomously conducting cyberattacks. Existing benchmarks fall short as they are limited to abstracted Capture the Flag competitions or lack comprehensive coverage. We introduce CVE-Bench, a real-world cybersecurity benchmark based on critical-severity Common Vulnerabilities and Exposures.
arXiv Detail & Related papers (2025-03-21T17:32:32Z)
SoK: A Systems Perspective on Compound AI Threats and Countermeasures [3.458371054070399]
We discuss different software and hardware attacks applicable to compound AI systems. We show how combining multiple attack mechanisms can reduce the threat model assumptions required for an isolated attack.
arXiv Detail & Related papers (2024-11-20T17:08:38Z)
Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics [70.93622520400385]
This paper systematically quantifies the robustness of VLA-based robotic systems. We introduce an untargeted position-aware attack objective that leverages spatial foundations to destabilize robotic actions. We also design an adversarial patch generation approach that places a small, colorful patch within the camera's view, effectively executing the attack in both digital and physical environments.
arXiv Detail & Related papers (2024-11-18T01:52:20Z)
Improving the Shortest Plank: Vulnerability-Aware Adversarial Training for Robust Recommender System [60.719158008403376]
Vulnerability-aware Adversarial Training (VAT) is designed to defend against poisoning attacks in recommender systems. VAT employs a novel vulnerability-aware function to estimate users' vulnerability based on the degree to which the system fits them.
arXiv Detail & Related papers (2024-09-26T02:24:03Z)
Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification [35.16099878559559]
Large language models (LLMs) have experienced significant development and are being deployed in real-world applications. We introduce a new type of attack that causes malfunctions by misleading the agent into executing repetitive or irrelevant actions. Our experiments reveal that these attacks can induce failure rates exceeding 80% in multiple scenarios.
arXiv Detail & Related papers (2024-07-30T14:35:31Z)
"Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models [74.05368440735468]
Retrieval-Augmented Generative (RAG) models enhance Large Language Models (LLMs) In this paper, we demonstrate a security threat where adversaries can exploit the openness of these knowledge bases.
arXiv Detail & Related papers (2024-06-26T05:36:23Z)
Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-based Decision-Making Systems [27.316115171846953]
Large Language Models (LLMs) have shown significant promise in real-world decision-making tasks for embodied AI. LLMs are fine-tuned to leverage their inherent common sense and reasoning abilities while being tailored to specific applications. This fine-tuning process introduces considerable safety and security vulnerabilities, especially in safety-critical cyber-physical systems.
arXiv Detail & Related papers (2024-05-27T17:59:43Z)
Artificial Intelligence as the New Hacker: Developing Agents for Offensive Security [0.0]
This paper explores the integration of Artificial Intelligence (AI) into offensive cybersecurity. It develops an autonomous AI agent, ReaperAI, designed to simulate and execute cyberattacks. ReaperAI demonstrates the potential to identify, exploit, and analyze security vulnerabilities autonomously.
arXiv Detail & Related papers (2024-05-09T18:15:12Z)
Mapping LLM Security Landscapes: A Comprehensive Stakeholder Risk Assessment Proposal [0.0]
We propose a risk assessment process using tools like the risk rating methodology which is used for traditional systems. We conduct scenario analysis to identify potential threat agents and map the dependent system components against vulnerability factors. We also map threats against three key stakeholder groups.
arXiv Detail & Related papers (2024-03-20T05:17:22Z)
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning [87.1610740406279]
White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in developing biological, cyber, and chemical weapons. Current evaluations are private, preventing further research into mitigating risk. We publicly release the Weapons of Mass Destruction Proxy benchmark, a dataset of 3,668 multiple-choice questions.
arXiv Detail & Related papers (2024-03-05T18:59:35Z)
Managing extreme AI risks amid rapid progress [171.05448842016125]
We describe risks that include large-scale social harms, malicious uses, and irreversible loss of human control over autonomous AI systems. There is a lack of consensus about how exactly such risks arise, and how to manage them. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness, and barely address autonomous systems.
arXiv Detail & Related papers (2023-10-26T17:59:06Z)
Identifying the Risks of LM Agents with an LM-Emulated Sandbox [68.26587052548287]
Language Model (LM) agents and tools enable a rich set of capabilities but also amplify potential risks. High cost of testing these agents will make it increasingly difficult to find high-stakes, long-tailed risks. We introduce ToolEmu: a framework that uses an LM to emulate tool execution and enables the testing of LM agents against a diverse range of tools and scenarios.
arXiv Detail & Related papers (2023-09-25T17:08:02Z)
Backdoor Attacks Against Incremental Learners: An Empirical Evaluation Study [79.33449311057088]
This paper empirically reveals the high vulnerability of 11 typical incremental learners against poisoning-based backdoor attack on 3 learning scenarios. The defense mechanism based on activation clustering is found to be effective in detecting our trigger pattern to mitigate potential security risks.
arXiv Detail & Related papers (2023-05-28T09:17:48Z)
On the Security Risks of Knowledge Graph Reasoning [71.64027889145261]
We systematize the security threats to KGR according to the adversary's objectives, knowledge, and attack vectors. We present ROAR, a new class of attacks that instantiate a variety of such threats. We explore potential countermeasures against ROAR, including filtering of potentially poisoning knowledge and training with adversarially augmented queries.
arXiv Detail & Related papers (2023-05-03T18:47:42Z)
CTI4AI: Threat Intelligence Generation and Sharing after Red Teaming AI Models [0.0]
A need to identify system vulnerabilities, potential threats, characterize properties that will enhance system robustness. A secondary need is to share this AI security threat intelligence between different stakeholders like, model developers, users, and AI/ML security professionals. In this paper, we create and describe a prototype system CTI4AI, to overcome the need to methodically identify and share AI/ML specific vulnerabilities and threat intelligence.
arXiv Detail & Related papers (2022-08-16T00:16:58Z)
Attack Techniques and Threat Identification for Vulnerabilities [1.1689657956099035]
prioritization and focus become critical, to spend their limited time on the highest risk vulnerabilities. In this work, we use machine learning and natural language processing techniques, as well as several publicly available data sets. We first map the vulnerabilities to a standard set of common weaknesses, and then common weaknesses to the attack techniques. This approach yields a Mean Reciprocal Rank (MRR) of 0.95, an accuracy comparable with those reported for state-of-the-art systems.
arXiv Detail & Related papers (2022-06-22T15:27:49Z)
Proceedings of the Artificial Intelligence for Cyber Security (AICS) Workshop at AAAI 2022 [55.573187938617636]
The workshop will focus on the application of AI to problems in cyber security. Cyber systems generate large volumes of data, utilizing this effectively is beyond human capabilities.
arXiv Detail & Related papers (2022-02-28T18:27:41Z)
Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the Age of AI-NIDS [70.60975663021952]
We study blackbox adversarial attacks on network classifiers. We argue that attacker-defender fixed points are themselves general-sum games with complex phase transitions. We show that a continual learning approach is required to study attacker-defender dynamics.
arXiv Detail & Related papers (2021-11-23T23:42:16Z)
Automating Privilege Escalation with Deep Reinforcement Learning [71.87228372303453]
In this work, we exemplify the potential threat of malicious actors using deep reinforcement learning to train automated agents. We present an agent that uses a state-of-the-art reinforcement learning algorithm to perform local privilege escalation. Our agent is usable for generating realistic attack sensor data for training and evaluating intrusion detection systems.
arXiv Detail & Related papers (2021-10-04T12:20:46Z)
The Feasibility and Inevitability of Stealth Attacks [63.14766152741211]
We study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence systems. In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself.
arXiv Detail & Related papers (2021-06-26T10:50:07Z)
SHARKS: Smart Hacking Approaches for RisK Scanning in Internet-of-Things and Cyber-Physical Systems based on Machine Learning [5.265938973293016]
Cyber-physical systems (CPS) and Internet-of-Things (IoT) devices are increasingly being deployed across multiple functionalities. These devices are inherently not secure across their comprehensive software, hardware, and network stacks. We present an innovative technique for detecting unknown system vulnerabilities, managing these vulnerabilities, and improving incident response.
arXiv Detail & Related papers (2021-01-07T22:01:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.