Related papers: SoK: A Systems Perspective on Compound AI Threats and Countermeasures

SoK: A Systems Perspective on Compound AI Threats and Countermeasures

URL: http://arxiv.org/abs/2411.13459v1
Date: Wed, 20 Nov 2024 17:08:38 GMT
Title: SoK: A Systems Perspective on Compound AI Threats and Countermeasures
Authors: Sarbartha Banerjee, Prateek Sahu, Mulong Luo, Anjo Vahldiek-Oberwagner, Neeraja J. Yadwadkar, Mohit Tiwari,
Abstract summary: We discuss different software and hardware attacks applicable to compound AI systems. We show how combining multiple attack mechanisms can reduce the threat model assumptions required for an isolated attack.
Score: 3.458371054070399
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) used across enterprises often use proprietary models and operate on sensitive inputs and data. The wide range of attack vectors identified in prior research - targeting various software and hardware components used in training and inference - makes it extremely challenging to enforce confidentiality and integrity policies. As we advance towards constructing compound AI inference pipelines that integrate multiple large language models (LLMs), the attack surfaces expand significantly. Attackers now focus on the AI algorithms as well as the software and hardware components associated with these systems. While current research often examines these elements in isolation, we find that combining cross-layer attack observations can enable powerful end-to-end attacks with minimal assumptions about the threat model. Given, the sheer number of existing attacks at each layer, we need a holistic and systemized understanding of different attack vectors at each layer. This SoK discusses different software and hardware attacks applicable to compound AI systems and demonstrates how combining multiple attack mechanisms can reduce the threat model assumptions required for an isolated attack. Next, we systematize the ML attacks in lines with the Mitre Att&ck framework to better position each attack based on the threat model. Finally, we outline the existing countermeasures for both software and hardware layers and discuss the necessity of a comprehensive defense strategy to enable the secure and high-performance deployment of compound AI systems.

Related papers

Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs [65.6660735371212]
We present textbftextscJustAsk, a framework that autonomously discovers effective extraction strategies through interaction alone.<n>It formulates extraction as an online exploration problem, using Upper Confidence Bound--based strategy selection and a hierarchical skill space spanning atomic probes and high-level orchestration.<n>Our results expose system prompts as a critical yet largely unprotected attack surface in modern agent systems.
arXiv Detail & Related papers (2026-01-29T03:53:25Z)
A Systematic Survey of Model Extraction Attacks and Defenses: State-of-the-Art and Perspectives [65.3369988566853]
Recent studies have demonstrated that adversaries can replicate a target model's functionality.<n>Model Extraction Attacks pose threats to intellectual property, privacy, and system security.<n>We propose a novel taxonomy that classifies MEAs according to attack mechanisms, defense approaches, and computing environments.
arXiv Detail & Related papers (2025-08-20T19:49:59Z)
Entangled Threats: A Unified Kill Chain Model for Quantum Machine Learning Security [32.73124984242397]
Quantum Machine Learning (QML) systems inherit vulnerabilities from classical machine learning.<n>We present a detailed taxonomy of QML attack vectors mapped to corresponding stages in a quantum-aware kill chain framework.<n>This work provides a foundation for more realistic threat modeling and proactive security-in-depth design in the emerging field of quantum machine learning.
arXiv Detail & Related papers (2025-07-11T14:25:36Z)
A Survey on Model Extraction Attacks and Defenses for Large Language Models [55.60375624503877]
Model extraction attacks pose significant security threats to deployed language models.<n>This survey provides a comprehensive taxonomy of extraction attacks and defenses, categorizing attacks into functionality extraction, training data extraction, and prompt-targeted attacks.<n>We examine defense mechanisms organized into model protection, data privacy protection, and prompt-targeted strategies, evaluating their effectiveness across different deployment scenarios.
arXiv Detail & Related papers (2025-06-26T22:02:01Z)
DoomArena: A framework for Testing AI Agents Against Evolving Security Threats [84.94654617852322]
We present DoomArena, a security evaluation framework for AI agents. It is a plug-in framework and integrates easily into realistic agentic frameworks. It is modular and decouples the development of attacks from details of the environment in which the agent is deployed.
arXiv Detail & Related papers (2025-04-18T20:36:10Z)
Adversarial Training for Defense Against Label Poisoning Attacks [53.893792844055106]
Label poisoning attacks pose significant risks to machine learning models. We propose a novel adversarial training defense strategy based on support vector machines (SVMs) to counter these threats. Our approach accommodates various model architectures and employs a projected gradient descent algorithm with kernel SVMs for adversarial training.
arXiv Detail & Related papers (2025-02-24T13:03:19Z)
AI-based Attacker Models for Enhancing Multi-Stage Cyberattack Simulations in Smart Grids Using Co-Simulation Environments [1.4563527353943984]
The transition to smart grids has increased the vulnerability of electrical power systems to advanced cyber threats. We propose a co-simulation framework that employs an autonomous agent to execute modular cyberattacks. Our approach offers a flexible, versatile source for data generation, aiding in faster prototyping and reducing development resources and time.
arXiv Detail & Related papers (2024-12-05T08:56:38Z)
A Unified Hardware-based Threat Detector for AI Accelerators [12.96840649714218]
We design UniGuard, a novel unified and non-intrusive detection methodology to safeguard FPGA-based AI accelerators. We employ a Time-to-Digital Converter to capture power fluctuations and train a supervised machine learning model to identify various types of threats.
arXiv Detail & Related papers (2023-11-28T10:55:02Z)
Baseline Defenses for Adversarial Attacks Against Aligned Language Models [109.75753454188705]
Recent work shows that text moderations can produce jailbreaking prompts that bypass defenses. We look at three types of defenses: detection (perplexity based), input preprocessing (paraphrase and retokenization), and adversarial training. We find that the weakness of existing discretes for text, combined with the relatively high costs of optimization, makes standard adaptive attacks more challenging for LLMs.
arXiv Detail & Related papers (2023-09-01T17:59:44Z)
A Human-in-the-Middle Attack against Object Detection Systems [4.764637544913963]
We propose a novel hardware attack inspired by Man-in-the-Middle attacks in cryptography. This attack generates a Universal Adversarial Perturbations (UAP) and injects the perturbation between the USB camera and the detection system. These findings raise serious concerns for applications of deep learning models in safety-critical systems, such as autonomous driving.
arXiv Detail & Related papers (2022-08-15T13:21:41Z)
Towards Automated Classification of Attackers' TTPs by combining NLP with ML Techniques [77.34726150561087]
We evaluate and compare different Natural Language Processing (NLP) and machine learning techniques used for security information extraction in research. Based on our investigations we propose a data processing pipeline that automatically classifies unstructured text according to attackers' tactics and techniques.
arXiv Detail & Related papers (2022-07-18T09:59:21Z)
Attacks, Defenses, And Tools: A Framework To Facilitate Robust AI/ML Systems [2.5137859989323528]
Software systems are increasingly relying on Artificial Intelligence (AI) and Machine Learning (ML) components. This paper presents a framework to characterize attacks and weaknesses associated with AI-enabled systems.
arXiv Detail & Related papers (2022-02-18T22:54:04Z)
Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the Age of AI-NIDS [70.60975663021952]
We study blackbox adversarial attacks on network classifiers. We argue that attacker-defender fixed points are themselves general-sum games with complex phase transitions. We show that a continual learning approach is required to study attacker-defender dynamics.
arXiv Detail & Related papers (2021-11-23T23:42:16Z)
The Feasibility and Inevitability of Stealth Attacks [63.14766152741211]
We study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence systems. In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself.
arXiv Detail & Related papers (2021-06-26T10:50:07Z)
Automating Cyber Threat Hunting Using NLP, Automated Query Generation, and Genetic Perturbation [8.669461942767098]
We have developed the WILEE system that cyber threat hunting by translating high-level threat descriptions into many possible concrete implementations. Both the (high-level) abstract and (low-level) concrete implementations are represented using a custom domain specific language. WILEE uses the implementations along with other logic, also written in the DSL, to automatically generate queries to confirm (or refute) any hypotheses tied to the potential adversarial.
arXiv Detail & Related papers (2021-04-23T13:19:12Z)
Adversarial Attack Attribution: Discovering Attributable Signals in Adversarial ML Attacks [0.7883722807601676]
Even production systems, such as self-driving cars and ML-as-a-service offerings, are susceptible to adversarial inputs. Can perturbed inputs be attributed to the methods used to generate the attack? We introduce the concept of adversarial attack attribution and create a simple supervised learning experimental framework to examine the feasibility of discovering attributable signals in adversarial attacks.
arXiv Detail & Related papers (2021-01-08T08:16:41Z)
Composite Adversarial Attacks [57.293211764569996]
Adversarial attack is a technique for deceiving Machine Learning (ML) models. In this paper, a new procedure called Composite Adrial Attack (CAA) is proposed for automatically searching the best combination of attack algorithms. CAA beats 10 top attackers on 11 diverse defenses with less elapsed time.
arXiv Detail & Related papers (2020-12-10T03:21:16Z)
On Adversarial Examples and Stealth Attacks in Artificial Intelligence Systems [62.997667081978825]
We present a formal framework for assessing and analyzing two classes of malevolent action towards generic Artificial Intelligence (AI) systems. The first class involves adversarial examples and concerns the introduction of small perturbations of the input data that cause misclassification. The second class, introduced here for the first time and named stealth attacks, involves small perturbations to the AI system itself.
arXiv Detail & Related papers (2020-04-09T10:56:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.