Quantifying Security Vulnerabilities: A Metric-Driven Security Analysis of Gaps in Current AI Standards
- URL: http://arxiv.org/abs/2502.08610v2
- Date: Sat, 26 Jul 2025 20:13:33 GMT
- Title: Quantifying Security Vulnerabilities: A Metric-Driven Security Analysis of Gaps in Current AI Standards
- Authors: Keerthana Madhavan, Abbas Yazdinejad, Fattane Zarrinkalam, Ali Dehghantanha,
- Abstract summary: This paper audits and quantifies security risks in three major AI governance standards: NIST AI RMF 1.0, UK's AI and Data Protection Risk Toolkit, and the EU's ALTAI.<n>Using a novel risk assessment methodology, we develop four key metrics: Risk Severity Index (RSI), Attack Potential Index (AVPI), Compliance-Security Gap Percentage (CSGP), and Root Cause Vulnerability Score (RCVS)
- Score: 5.388550452190688
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: As AI systems integrate into critical infrastructure, security gaps in AI compliance frameworks demand urgent attention. This paper audits and quantifies security risks in three major AI governance standards: NIST AI RMF 1.0, UK's AI and Data Protection Risk Toolkit, and the EU's ALTAI. Using a novel risk assessment methodology, we develop four key metrics: Risk Severity Index (RSI), Attack Potential Index (AVPI), Compliance-Security Gap Percentage (CSGP), and Root Cause Vulnerability Score (RCVS). Our analysis identifies 136 concerns across the frameworks, exposing significant gaps. NIST fails to address 69.23 percent of identified risks, ALTAI has the highest attack vector vulnerability (AVPI = 0.51) and the ICO Toolkit has the largest compliance-security gap, with 80.00 percent of high-risk concerns remaining unresolved. Root cause analysis highlights under-defined processes (ALTAI RCVS = 033) and weak implementation guidance (NIST and ICO RCVS = 0.25) as critical weaknesses. These findings emphasize the need for stronger, enforceable security controls in AI compliance. We offer targeted recommendations to enhance security posture and bridge the gap between compliance and real-world AI risks.
Related papers
- Bench-2-CoP: Can We Trust Benchmarking for EU AI Compliance? [2.010294990327175]
Current AI evaluation practices depend heavily on established benchmarks.<n>These tools were not designed to measure the systemic risks that are the focus of the new regulatory landscape.<n>This research addresses the urgent need to quantify this "benchmark-regulation gap"
arXiv Detail & Related papers (2025-08-07T15:03:39Z) - Beyond Safe Answers: A Benchmark for Evaluating True Risk Awareness in Large Reasoning Models [29.569220030102986]
We introduce textbfBeyond Safe Answers (BSA) bench, a novel benchmark comprising 2,000 challenging instances organized into three distinct SSA scenario types.<n> Evaluations of 19 state-of-the-art LRMs demonstrate the difficulty of this benchmark, with top-performing models achieving only 38.0% accuracy in correctly identifying risk rationales.<n>Our work provides a comprehensive assessment tool for evaluating and improving safety reasoning fidelity in LRMs, advancing the development of genuinely risk-aware and reliably safe AI systems.
arXiv Detail & Related papers (2025-05-26T08:49:19Z) - Safety Co-Option and Compromised National Security: The Self-Fulfilling Prophecy of Weakened AI Risk Thresholds [0.0]
We show how "safety revisionism" has allowed AI technologists to engage in "safety revisionism"
We explore how the current trajectory for AI risk determination and evaluation for foundation model use within national security is poised for a race to the bottom.
arXiv Detail & Related papers (2025-04-21T13:20:56Z) - strideSEA: A STRIDE-centric Security Evaluation Approach [1.996354642790599]
strideSEA integrates STRIDE as the central classification scheme into the security activities of threat modeling, attack scenario analysis, risk analysis, and countermeasure recommendation.
The application of strideSEA is demonstrated in a real-world online immunization system case study.
arXiv Detail & Related papers (2025-03-24T18:00:17Z) - AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons [62.374792825813394]
This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability.
The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories.
arXiv Detail & Related papers (2025-02-19T05:58:52Z) - Position: Mind the Gap-the Growing Disconnect Between Established Vulnerability Disclosure and AI Security [56.219994752894294]
We argue that adapting existing processes for AI security reporting is doomed to fail due to fundamental shortcomings for the distinctive characteristics of AI systems.<n>Based on our proposal to address these shortcomings, we discuss an approach to AI security reporting and how the new AI paradigm, AI agents, will further reinforce the need for specialized AI security incident reporting advancements.
arXiv Detail & Related papers (2024-12-19T13:50:26Z) - Engineering Trustworthy AI: A Developer Guide for Empirical Risk Minimization [53.80919781981027]
Key requirements for trustworthy AI can be translated into design choices for the components of empirical risk minimization.
We hope to provide actionable guidance for building AI systems that meet emerging standards for trustworthiness of AI.
arXiv Detail & Related papers (2024-10-25T07:53:32Z) - Towards Assuring EU AI Act Compliance and Adversarial Robustness of LLMs [1.368472250332885]
Large language models are prone to misuse and vulnerable to security threats.
The European Union's Artificial Intelligence Act seeks to enforce AI robustness in certain contexts.
arXiv Detail & Related papers (2024-10-04T18:38:49Z) - EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents [53.717918131568936]
Embodied artificial intelligence (EAI) integrates advanced AI models into physical entities for real-world interaction.<n>Foundation models as the "brain" of EAI agents for high-level task planning have shown promising results.<n>However, the deployment of these agents in physical environments presents significant safety challenges.<n>This study introduces EARBench, a novel framework for automated physical risk assessment in EAI scenarios.
arXiv Detail & Related papers (2024-08-08T13:19:37Z) - Responsible AI Question Bank: A Comprehensive Tool for AI Risk Assessment [18.966590454042272]
The study introduces our Responsible AI (RAI) Question Bank, a comprehensive framework and tool designed to support diverse AI initiatives.<n>By integrating AI ethics principles such as fairness, transparency, and accountability into a structured question format, the RAI Question Bank aids in identifying potential risks.
arXiv Detail & Related papers (2024-08-02T22:40:20Z) - Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? [59.96471873997733]
We propose an empirical foundation for developing more meaningful safety metrics and define AI safety in a machine learning research context.
We aim to provide a more rigorous framework for AI safety research, advancing the science of safety evaluations and clarifying the path towards measurable progress.
arXiv Detail & Related papers (2024-07-31T17:59:24Z) - A Relevance Model for Threat-Centric Ranking of Cybersecurity Vulnerabilities [0.29998889086656577]
The relentless process of tracking and remediating vulnerabilities is a top concern for cybersecurity professionals.
We provide a framework for vulnerability management specifically focused on mitigating threats using adversary criteria derived from MITRE ATT&CK.
Our results show an average 71.5% - 91.3% improvement towards the identification of vulnerabilities likely to be targeted and exploited by cyber threat actors.
arXiv Detail & Related papers (2024-06-09T23:29:12Z) - Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems [88.80306881112313]
We will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI.
The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees.
We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them.
arXiv Detail & Related papers (2024-05-10T17:38:32Z) - ASSERT: Automated Safety Scenario Red Teaming for Evaluating the
Robustness of Large Language Models [65.79770974145983]
ASSERT, Automated Safety Scenario Red Teaming, consists of three methods -- semantically aligned augmentation, target bootstrapping, and adversarial knowledge injection.
We partition our prompts into four safety domains for a fine-grained analysis of how the domain affects model performance.
We find statistically significant performance differences of up to 11% in absolute classification accuracy among semantically related scenarios and error rates of up to 19% absolute error in zero-shot adversarial settings.
arXiv Detail & Related papers (2023-10-14T17:10:28Z) - QB4AIRA: A Question Bank for AI Risk Assessment [19.783485414942284]
QB4AIRA comprises 293 prioritized questions covering a wide range of AI risk areas.
It serves as a valuable resource for stakeholders in assessing and managing AI risks.
arXiv Detail & Related papers (2023-05-16T09:18:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.