Related papers: DREAM: Dynamic Red-teaming across Environments for AI Models

DREAM: Dynamic Red-teaming across Environments for AI Models

URL: http://arxiv.org/abs/2512.19016v1
Date: Mon, 22 Dec 2025 04:11:57 GMT
Title: DREAM: Dynamic Red-teaming across Environments for AI Models
Authors: Liming Lu, Xiang Gu, Junyu Huang, Jiawei Du, Yunhuai Liu, Yongbin Zhou, Shuchao Pang,
Abstract summary: We introduce DREAM, a framework for evaluation of Large Language Models (LLMs) against dynamic, multi-stage attacks.<n>At its core, DREAM uses a Cross-Environment Adrial Knowledge Graph (CE-AKG) to maintain stateful, cross-domain understanding of vulnerabilities.<n>Our evaluation of 12 leading LLM agents reveals a critical vulnerability: these attack chains succeed in over 70% of cases for most models.
Score: 28.267208528754082
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are increasingly used in agentic systems, where their interactions with diverse tools and environments create complex, multi-stage safety challenges. However, existing benchmarks mostly rely on static, single-turn assessments that miss vulnerabilities from adaptive, long-chain attacks. To fill this gap, we introduce DREAM, a framework for systematic evaluation of LLM agents against dynamic, multi-stage attacks. At its core, DREAM uses a Cross-Environment Adversarial Knowledge Graph (CE-AKG) to maintain stateful, cross-domain understanding of vulnerabilities. This graph guides a Contextualized Guided Policy Search (C-GPS) algorithm that dynamically constructs attack chains from a knowledge base of 1,986 atomic actions across 349 distinct digital environments. Our evaluation of 12 leading LLM agents reveals a critical vulnerability: these attack chains succeed in over 70% of cases for most models, showing the power of stateful, cross-environment exploits. Through analysis of these failures, we identify two key weaknesses in current agents: contextual fragility, where safety behaviors fail to transfer across environments, and an inability to track long-term malicious intent. Our findings also show that traditional safety measures, such as initial defense prompts, are largely ineffective against attacks that build context over multiple interactions. To advance agent safety research, we release DREAM as a tool for evaluating vulnerabilities and developing more robust defenses.

Related papers

CIBER: A Comprehensive Benchmark for Security Evaluation of Code Interpreter Agents [27.35968236632966]
LLM-based code interpreter agents are increasingly deployed in critical situations.<n>Existing benchmarks fail to capture the security risks arising from dynamic code execution, tool interactions, and multi-turn context.<n>We introduce CIBER, an automated benchmark that combines dynamic attack generation, isolated secure sandboxing, and state-aware evaluation.
arXiv Detail & Related papers (2026-02-23T06:41:41Z)
OMNI-LEAK: Orchestrator Multi-Agent Network Induced Data Leakage [59.3826294523924]
We investigate the security vulnerabilities of a popular multi-agent pattern known as the orchestrator setup.<n>We report the susceptibility of frontier models to different categories of attacks, finding that both reasoning and non-reasoning models are vulnerable.
arXiv Detail & Related papers (2026-02-13T21:32:32Z)
Automated Red-Teaming Framework for Large Language Model Security Assessment: A Comprehensive Attack Generation and Detection System [4.864011355064205]
This paper introduces an automated red-teaming framework that generates, executes, and evaluates adversarial prompts to uncover security vulnerabilities in large language models (LLMs)<n>Our framework integrates meta-prompting-based attack synthesis, multi-modal vulnerability detection, and standardized evaluation protocols spanning six major threat categories.<n> Experiments on the GPT-OSS-20B model reveal 47 distinct vulnerabilities, including 21 high-severity and 12 novel attack patterns.
arXiv Detail & Related papers (2025-12-21T19:12:44Z)
Context-Aware Hierarchical Learning: A Two-Step Paradigm towards Safer LLMs [38.3239023969819]
Large Language Models (LLMs) have emerged as powerful tools for diverse applications.<n>We identify and propose a novel class of vulnerabilities, termed Tool-Completion Attack (TCA)<n>We introduce Context-Aware Hierarchical Learning (CAHL) to address these vulnerabilities.
arXiv Detail & Related papers (2025-12-03T12:10:21Z)
AttackVLA: Benchmarking Adversarial and Backdoor Attacks on Vision-Language-Action Models [60.39655329875822]
Vision-Language-Action (VLA) models enable robots to interpret natural-language instructions and perform diverse tasks.<n>Despite growing interest in attacking such models, the effectiveness of existing techniques remains unclear.<n>We propose AttackVLA, a unified framework that aligns with the VLA development lifecycle.
arXiv Detail & Related papers (2025-11-15T10:30:46Z)
Bridging AI and Software Security: A Comparative Vulnerability Assessment of LLM Agent Deployment Paradigms [1.03121181235382]
Large Language Model (LLM) agents face security vulnerabilities spanning AI-specific and traditional software domains.<n>This study bridges this gap through comparative evaluation of Function Calling architecture and Model Context Protocol (MCP) deployment paradigms.<n>We tested 3,250 attack scenarios across seven language models, evaluating simple, composed, and chained attacks targeting both AI-specific threats and software vulnerabilities.
arXiv Detail & Related papers (2025-07-08T18:24:28Z)
Temporal Context Awareness: A Defense Framework Against Multi-turn Manipulation Attacks on Large Language Models [0.0]
Large Language Models (LLMs) are increasingly vulnerable to sophisticated multi-turn manipulation attacks.<n>This paper introduces the Temporal Context Awareness framework, a novel defense mechanism designed to address this challenge.<n>Preliminary evaluations on simulated adversarial scenarios demonstrate the framework's potential to identify subtle manipulation patterns.
arXiv Detail & Related papers (2025-03-18T22:30:17Z)
Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models [53.580928907886324]
Reasoning-Augmented Conversation is a novel multi-turn jailbreak framework.<n>It reformulates harmful queries into benign reasoning tasks.<n>We show that RACE achieves state-of-the-art attack effectiveness in complex conversational scenarios.
arXiv Detail & Related papers (2025-02-16T09:27:44Z)
Black-Box Adversarial Attack on Vision Language Models for Autonomous Driving [65.61999354218628]
We take the first step toward designing black-box adversarial attacks specifically targeting vision-language models (VLMs) in autonomous driving systems.<n>We propose Cascading Adversarial Disruption (CAD), which targets low-level reasoning breakdown by generating and injecting semantics.<n>We present Risky Scene Induction, which addresses dynamic adaptation by leveraging a surrogate VLM to understand and construct high-level risky scenarios.
arXiv Detail & Related papers (2025-01-23T11:10:02Z)
In-Context Experience Replay Facilitates Safety Red-Teaming of Text-to-Image Diffusion Models [104.94706600050557]
Text-to-image (T2I) models have shown remarkable progress, but their potential to generate harmful content remains a critical concern in the ML community.<n>We propose ICER, a novel red-teaming framework that generates interpretable and semantic meaningful problematic prompts.<n>Our work provides crucial insights for developing more robust safety mechanisms in T2I systems.
arXiv Detail & Related papers (2024-11-25T04:17:24Z)
Jailbreaking and Mitigation of Vulnerabilities in Large Language Models [8.345554966569479]
Large Language Models (LLMs) have transformed artificial intelligence by advancing natural language understanding and generation.<n>Despite these advancements, LLMs have shown considerable vulnerabilities, particularly to prompt injection and jailbreaking attacks.<n>This review analyzes the state of research on these vulnerabilities and presents available defense strategies.
arXiv Detail & Related papers (2024-10-20T00:00:56Z)
Dissecting Adversarial Robustness of Multimodal LM Agents [70.2077308846307]
We manually create 200 targeted adversarial tasks and evaluation scripts in a realistic threat model on top of VisualWebArena.<n>We find that we can successfully break latest agents that use black-box frontier LMs, including those that perform reflection and tree search.<n>We also use ARE to rigorously evaluate how the robustness changes as new components are added.
arXiv Detail & Related papers (2024-06-18T17:32:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.