Related papers: Computational Safety for Generative AI: A Signal Processing Perspective

Computational Safety for Generative AI: A Signal Processing Perspective

URL: http://arxiv.org/abs/2502.12445v1
Date: Tue, 18 Feb 2025 02:26:50 GMT
Title: Computational Safety for Generative AI: A Signal Processing Perspective
Authors: Pin-Yu Chen,
Abstract summary: computational safety is a mathematical framework that enables the quantitative assessment, formulation, and study of safety challenges in GenAI.<n>We show how sensitivity analysis and loss landscape analysis can be used to detect malicious prompts with jailbreak attempts.<n>We discuss key open research challenges, opportunities, and the essential role of signal processing in computational AI safety.
Score: 65.268245109828
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: AI safety is a rapidly growing area of research that seeks to prevent the harm and misuse of frontier AI technology, particularly with respect to generative AI (GenAI) tools that are capable of creating realistic and high-quality content through text prompts. Examples of such tools include large language models (LLMs) and text-to-image (T2I) diffusion models. As the performance of various leading GenAI models approaches saturation due to similar training data sources and neural network architecture designs, the development of reliable safety guardrails has become a key differentiator for responsibility and sustainability. This paper presents a formalization of the concept of computational safety, which is a mathematical framework that enables the quantitative assessment, formulation, and study of safety challenges in GenAI through the lens of signal processing theory and methods. In particular, we explore two exemplary categories of computational safety challenges in GenAI that can be formulated as hypothesis testing problems. For the safety of model input, we show how sensitivity analysis and loss landscape analysis can be used to detect malicious prompts with jailbreak attempts. For the safety of model output, we elucidate how statistical signal processing and adversarial learning can be used to detect AI-generated content. Finally, we discuss key open research challenges, opportunities, and the essential role of signal processing in computational AI safety.

Related papers

Detecting Cybersecurity Threats by Integrating Explainable AI with SHAP Interpretability and Strategic Data Sampling [0.0]
The framework addresses three fundamental challenges in deploying AI for threat detection.<n>Our approach maintains detection efficacy while reducing computational overhead.<n>It provides a robust foundation for deploying trustworthy AI systems in security operations centers.
arXiv Detail & Related papers (2026-02-22T08:01:14Z)
AI Safeguards, Generative AI and the Pandora Box: AI Safety Measures to Protect Businesses and Personal Reputation [0.0]
Generative AI has unleashed the power of content generation and it has unwittingly opened the box of realistic deepfake.<n>The resolution & hybridization detection techniques using neural networks allows flagging of the content.<n>Good detection techniques & flagging allow AI safety - this is the main focus of this paper.
arXiv Detail & Related papers (2026-01-08T06:58:42Z)
Toward Quantitative Modeling of Cybersecurity Risks Due to AI Misuse [50.87630846876635]
We develop nine detailed cyber risk models.<n>Each model decomposes attacks into steps using the MITRE ATT&CK framework.<n>Individual estimates are aggregated through Monte Carlo simulation.
arXiv Detail & Related papers (2025-12-09T17:54:17Z)
Algorithms for Adversarially Robust Deep Learning [58.656107500646364]
We discuss recent progress toward designing algorithms that exhibit desirable robustness properties.<n>We present new algorithms that achieve state-of-the-art generalization in medical imaging, molecular identification, and image classification.<n>We propose new attacks and defenses, which represent the frontier of progress toward designing robust language-based agents.
arXiv Detail & Related papers (2025-09-23T14:48:58Z)
A Peek Behind the Curtain: Using Step-Around Prompt Engineering to Identify Bias and Misinformation in GenAI Models [0.0]
We discuss how Internet-sourced training data introduces unintended biases and misinformation into AI systems. We argue that step-around prompting serves a vital role in identifying potential vulnerabilities while acknowledging its dual nature as both a research tool and a security threat.
arXiv Detail & Related papers (2025-03-19T13:47:28Z)
Safety at Scale: A Comprehensive Survey of Large Model Safety [298.05093528230753]
We present a comprehensive taxonomy of safety threats to large models, including adversarial attacks, data poisoning, backdoor attacks, jailbreak and prompt injection attacks, energy-latency attacks, data and model extraction attacks, and emerging agent-specific threats. We identify and discuss the open challenges in large model safety, emphasizing the need for comprehensive safety evaluations, scalable and effective defense mechanisms, and sustainable data practices.
arXiv Detail & Related papers (2025-02-02T05:14:22Z)
Bringing Order Amidst Chaos: On the Role of Artificial Intelligence in Secure Software Engineering [0.0]
The ever-evolving technological landscape offers both opportunities and threats, creating a dynamic space where chaos and order compete.<n>Secure software engineering (SSE) must continuously address vulnerabilities that endanger software systems.<n>This thesis seeks to bring order to the chaos in SSE by addressing domain-specific differences that impact AI accuracy.
arXiv Detail & Related papers (2025-01-09T11:38:58Z)
Generative Artificial Intelligence Meets Synthetic Aperture Radar: A Survey [49.29751866761522]
This paper aims to investigate the intersection of GenAI and SAR. First, we illustrate the common data generation-based applications in SAR field. Then, an overview of the latest GenAI models is systematically reviewed. Finally, the corresponding applications in SAR domain are also included.
arXiv Detail & Related papers (2024-11-05T03:06:00Z)
An Adaptive End-to-End IoT Security Framework Using Explainable AI and LLMs [1.9662978733004601]
This paper presents an innovative framework for real-time IoT attack detection and response that leverages Machine Learning (ML), Explainable AI (XAI), and Large Language Models (LLM) Our end-to-end framework not only facilitates a seamless transition from model development to deployment but also represents a real-world application capability that is often lacking in existing research.
arXiv Detail & Related papers (2024-09-20T03:09:23Z)
EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents [53.717918131568936]
Embodied artificial intelligence (EAI) integrates advanced AI models into physical entities for real-world interaction.<n>Foundation models as the "brain" of EAI agents for high-level task planning have shown promising results.<n>However, the deployment of these agents in physical environments presents significant safety challenges.<n>This study introduces EARBench, a novel framework for automated physical risk assessment in EAI scenarios.
arXiv Detail & Related papers (2024-08-08T13:19:37Z)
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? [59.96471873997733]
We propose an empirical foundation for developing more meaningful safety metrics and define AI safety in a machine learning research context.<n>We aim to provide a more rigorous framework for AI safety research, advancing the science of safety evaluations and clarifying the path towards measurable progress.
arXiv Detail & Related papers (2024-07-31T17:59:24Z)
Review of Generative AI Methods in Cybersecurity [0.6990493129893112]
This paper provides a comprehensive overview of the current state-of-the-art deployments of Generative AI (GenAI) It covers assaults, jailbreaking, and applications of prompt injection and reverse psychology. It also provides the various applications of GenAI in cybercrimes, such as automated hacking, phishing emails, social engineering, reverse cryptography, creating attack payloads, and creating malware.
arXiv Detail & Related papers (2024-03-13T17:05:05Z)
Cloud-based XAI Services for Assessing Open Repository Models Under Adversarial Attacks [7.500941533148728]
We propose a cloud-based service framework that encapsulates computing components and assessment tasks into pipelines. We demonstrate the application of XAI services for assessing five quality attributes of AI models.
arXiv Detail & Related papers (2024-01-22T00:37:01Z)
Data-Driven and SE-assisted AI Model Signal-Awareness Enhancement and Introspection [61.571331422347875]
We propose a data-driven approach to enhance models' signal-awareness. We combine the SE concept of code complexity with the AI technique of curriculum learning. We achieve up to 4.8x improvement in model signal awareness.
arXiv Detail & Related papers (2021-11-10T17:58:18Z)
On Safety Assessment of Artificial Intelligence [0.0]
We show that many models of artificial intelligence, in particular machine learning, are statistical models. Part of the budget of dangerous random failures for the relevant safety integrity level needs to be used for the probabilistic faulty behavior of the AI system. We propose a research challenge that may be decisive for the use of AI in safety related systems.
arXiv Detail & Related papers (2020-02-29T14:05:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.