Related papers: A Formal Framework for Assessing and Mitigating Emergent Security Risks in Generative AI Models: Bridging Theory and Dynamic Risk Mitigation

A Formal Framework for Assessing and Mitigating Emergent Security Risks in Generative AI Models: Bridging Theory and Dynamic Risk Mitigation

URL: http://arxiv.org/abs/2410.13897v1
Date: Tue, 15 Oct 2024 02:51:32 GMT
Title: A Formal Framework for Assessing and Mitigating Emergent Security Risks in Generative AI Models: Bridging Theory and Dynamic Risk Mitigation
Authors: Aviral Srivastava, Sourav Panda,
Abstract summary: As generative AI systems, including large language models (LLMs) and diffusion models, advance rapidly, their growing adoption has led to new and complex security risks. This paper introduces a novel formal framework for categorizing and mitigating these emergent security risks. We identify previously under-explored risks, including latent space exploitation, multi-modal cross-attack vectors, and feedback-loop-induced model degradation.
Score: 0.3413711585591077
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As generative AI systems, including large language models (LLMs) and diffusion models, advance rapidly, their growing adoption has led to new and complex security risks often overlooked in traditional AI risk assessment frameworks. This paper introduces a novel formal framework for categorizing and mitigating these emergent security risks by integrating adaptive, real-time monitoring, and dynamic risk mitigation strategies tailored to generative models' unique vulnerabilities. We identify previously under-explored risks, including latent space exploitation, multi-modal cross-attack vectors, and feedback-loop-induced model degradation. Our framework employs a layered approach, incorporating anomaly detection, continuous red-teaming, and real-time adversarial simulation to mitigate these risks. We focus on formal verification methods to ensure model robustness and scalability in the face of evolving threats. Though theoretical, this work sets the stage for future empirical validation by establishing a detailed methodology and metrics for evaluating the performance of risk mitigation strategies in generative AI systems. This framework addresses existing gaps in AI safety, offering a comprehensive road map for future research and implementation.

Related papers

SafeMobile: Chain-level Jailbreak Detection and Automated Evaluation for Multimodal Mobile Agents [58.21223208538351]
This work explores the security issues surrounding mobile multimodal agents.<n>It attempts to construct a risk discrimination mechanism by incorporating behavioral sequence information.<n>It also designs an automated assisted assessment scheme based on a large language model.
arXiv Detail & Related papers (2025-07-01T15:10:00Z)
A Survey on Autonomy-Induced Security Risks in Large Model-Based Agents [45.53643260046778]
Recent advances in large language models (LLMs) have catalyzed the rise of autonomous AI agents.<n>These large-model agents mark a paradigm shift from static inference systems to interactive, memory-augmented entities.
arXiv Detail & Related papers (2025-06-30T13:34:34Z)
Exploring the Secondary Risks of Large Language Models [17.845215420030467]
We introduce secondary risks marked by harmful or misleading behaviors during benign prompts.<n>Unlike adversarial attacks, these risks stem from imperfect generalization and often evade standard safety mechanisms.<n>We propose SecLens, a black-box, multi-objective search framework that efficiently elicits secondary risk behaviors.
arXiv Detail & Related papers (2025-06-14T07:31:52Z)
Adapting Probabilistic Risk Assessment for AI [0.0]
General-purpose artificial intelligence (AI) systems present an urgent risk management challenge. Current methods often rely on selective testing and undocumented assumptions about risk priorities. This paper introduces the probabilistic risk assessment (PRA) for AI framework.
arXiv Detail & Related papers (2025-04-25T17:59:14Z)
Robust Intrusion Detection System with Explainable Artificial Intelligence [0.0]
Adversarial input can exploit machine learning (ML) models through standard interfaces. Conventional defenses such as adversarial training are costly in computational terms and often fail to provide real-time detection. We suggest a novel strategy for detecting and mitigating adversarial attacks using eXplainable Artificial Intelligence (XAI)
arXiv Detail & Related papers (2025-03-07T10:31:59Z)
A Survey of Model Extraction Attacks and Defenses in Distributed Computing Environments [55.60375624503877]
Model Extraction Attacks (MEAs) threaten modern machine learning systems by enabling adversaries to steal models, exposing intellectual property and training data. This survey is motivated by the urgent need to understand how the unique characteristics of cloud, edge, and federated deployments shape attack vectors and defense requirements. We systematically examine the evolution of attack methodologies and defense mechanisms across these environments, demonstrating how environmental factors influence security strategies in critical sectors such as autonomous vehicles, healthcare, and financial services.
arXiv Detail & Related papers (2025-02-22T03:46:50Z)
Multi-Agent Risks from Advanced AI [90.74347101431474]
Multi-agent systems of advanced AI pose novel and under-explored risks. We identify three key failure modes based on agents' incentives, as well as seven key risk factors. We highlight several important instances of each risk, as well as promising directions to help mitigate them.
arXiv Detail & Related papers (2025-02-19T23:03:21Z)
Safety at Scale: A Comprehensive Survey of Large Model Safety [298.05093528230753]
We present a comprehensive taxonomy of safety threats to large models, including adversarial attacks, data poisoning, backdoor attacks, jailbreak and prompt injection attacks, energy-latency attacks, data and model extraction attacks, and emerging agent-specific threats. We identify and discuss the open challenges in large model safety, emphasizing the need for comprehensive safety evaluations, scalable and effective defense mechanisms, and sustainable data practices.
arXiv Detail & Related papers (2025-02-02T05:14:22Z)
SoK: Unifying Cybersecurity and Cybersafety of Multimodal Foundation Models with an Information Theory Approach [58.93030774141753]
Multimodal foundation models (MFMs) represent a significant advancement in artificial intelligence. This paper conceptualizes cybersafety and cybersecurity in the context of multimodal learning. We present a comprehensive Systematization of Knowledge (SoK) to unify these concepts in MFMs, identifying key threats.
arXiv Detail & Related papers (2024-11-17T23:06:20Z)
EAIRiskBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents [47.69642609574771]
Embodied artificial intelligence (EAI) integrates advanced AI models into physical entities for real-world interaction. Foundation models as the "brain" of EAI agents for high-level task planning have shown promising results. However, the deployment of these agents in physical environments presents significant safety challenges. This study introduces EAIRiskBench, a novel framework for automated physical risk assessment in EAI scenarios.
arXiv Detail & Related papers (2024-08-08T13:19:37Z)
Diffusion Models for Offline Multi-agent Reinforcement Learning with Safety Constraints [0.0]
We introduce an innovative framework integrating diffusion models within the Multi-agent Reinforcement Learning paradigm. This approach notably enhances the safety of actions taken by multiple agents through risk mitigation while modeling coordinated action.
arXiv Detail & Related papers (2024-06-30T16:05:31Z)
Threat Modelling and Risk Analysis for Large Language Model (LLM)-Powered Applications [0.0]
Large Language Models (LLMs) have revolutionized various applications by providing advanced natural language processing capabilities. This paper explores the threat modeling and risk analysis specifically tailored for LLM-powered applications.
arXiv Detail & Related papers (2024-06-16T16:43:58Z)
Asset-centric Threat Modeling for AI-based Systems [7.696807063718328]
This paper presents ThreatFinderAI, an approach and tool to model AI-related assets, threats, countermeasures, and quantify residual risks. To evaluate the practicality of the approach, participants were tasked to recreate a threat model developed by cybersecurity experts of an AI-based healthcare platform. Overall, the solution's usability was well-perceived and effectively supports threat identification and risk discussion.
arXiv Detail & Related papers (2024-03-11T08:40:01Z)
Capsa: A Unified Framework for Quantifying Risk in Deep Neural Networks [142.67349734180445]
Existing algorithms that provide risk-awareness to deep neural networks are complex and ad-hoc. Here we present capsa, a framework for extending models with risk-awareness.
arXiv Detail & Related papers (2023-08-01T02:07:47Z)
Typology of Risks of Generative Text-to-Image Models [1.933681537640272]
This paper investigates the direct risks and harms associated with modern text-to-image generative models, such as DALL-E and Midjourney. Our review reveals significant knowledge gaps concerning the understanding and treatment of these risks despite some already being addressed. We identify 22 distinct risk types, spanning issues from data bias to malicious use.
arXiv Detail & Related papers (2023-07-08T20:33:30Z)
Safe Multi-agent Learning via Trapping Regions [89.24858306636816]
We apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning. We propose a binary partitioning algorithm for verification that candidate sets form trapping regions in systems with known learning dynamics, and a sampling algorithm for scenarios where learning dynamics are not known.
arXiv Detail & Related papers (2023-02-27T14:47:52Z)
Holistic Adversarial Robustness of Deep Learning Models [91.34155889052786]
Adversarial robustness studies the worst-case performance of a machine learning model to ensure safety and reliability. This paper provides a comprehensive overview of research topics and foundational principles of research methods for adversarial robustness of deep learning models.
arXiv Detail & Related papers (2022-02-15T05:30:27Z)
Risk-Sensitive Sequential Action Control with Multi-Modal Human Trajectory Forecasting for Safe Crowd-Robot Interaction [55.569050872780224]
We present an online framework for safe crowd-robot interaction based on risk-sensitive optimal control, wherein the risk is modeled by the entropic risk measure. Our modular approach decouples the crowd-robot interaction into learning-based prediction and model-based control. A simulation study and a real-world experiment show that the proposed framework can accomplish safe and efficient navigation while avoiding collisions with more than 50 humans in the scene.
arXiv Detail & Related papers (2020-09-12T02:02:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.