Genesis: Evolving Attack Strategies for LLM Web Agent Red-Teaming
- URL: http://arxiv.org/abs/2510.18314v1
- Date: Tue, 21 Oct 2025 05:49:37 GMT
- Title: Genesis: Evolving Attack Strategies for LLM Web Agent Red-Teaming
- Authors: Zheng Zhang, Jiarui He, Yuchen Cai, Deheng Ye, Peilin Zhao, Ruili Feng, Hao Wang,
- Abstract summary: Existing red-teaming approaches mainly rely on manually crafted attack strategies or static models trained offline.<n>We propose Genesis, a novel agentic framework composed of three modules: Attacker, Scorer, and Strategist.<n>Our framework discovers novel strategies and consistently outperforms existing attack baselines.
- Score: 45.95972813586392
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As large language model (LLM) agents increasingly automate complex web tasks, they boost productivity while simultaneously introducing new security risks. However, relevant studies on web agent attacks remain limited. Existing red-teaming approaches mainly rely on manually crafted attack strategies or static models trained offline. Such methods fail to capture the underlying behavioral patterns of web agents, making it difficult to generalize across diverse environments. In web agent attacks, success requires the continuous discovery and evolution of attack strategies. To this end, we propose Genesis, a novel agentic framework composed of three modules: Attacker, Scorer, and Strategist. The Attacker generates adversarial injections by integrating the genetic algorithm with a hybrid strategy representation. The Scorer evaluates the target web agent's responses to provide feedback. The Strategist dynamically uncovers effective strategies from interaction logs and compiles them into a continuously growing strategy library, which is then re-deployed to enhance the Attacker's effectiveness. Extensive experiments across various web tasks show that our framework discovers novel strategies and consistently outperforms existing attack baselines.
Related papers
- An Automated Framework for Strategy Discovery, Retrieval, and Evolution in LLM Jailbreak Attacks [9.715575204912167]
We propose a jailbreak framework that autonomously discovers, retrieves, and evolves attack strategies.<n>ASTRA achieves an average Attack Success Rate (ASR) of 82.7%, significantly outperforming baselines.
arXiv Detail & Related papers (2025-11-04T08:24:22Z) - Adversarial Reinforcement Learning for Large Language Model Agent Safety [20.704989548285372]
Large Language Model (LLM) agents can leverage tools like Google Search to complete complex tasks.<n>Current defense strategies rely on fine-tuning LLM agents on datasets of known attacks.<n>We propose Adversarial Reinforcement Learning for Agent Safety (ARLAS), a novel framework that leverages adversarial reinforcement learning (RL) by formulating the problem as a two-player zero-sum game.
arXiv Detail & Related papers (2025-10-06T23:09:18Z) - Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks [63.803415430308114]
Current large language models are vulnerable to adversarial attacks in multi-turn interaction settings.<n>We propose DialTree-RPO, an on-policy reinforcement learning framework integrated with tree search.<n>Our approach achieves more than 25.9% higher ASR across 10 target models compared to previous state-of-the-art approaches.
arXiv Detail & Related papers (2025-10-02T17:57:05Z) - Active Attacks: Red-teaming LLMs via Adaptive Environments [71.55110023234376]
We address the challenge of generating diverse attack prompts for large language models (LLMs)<n>We introduce textitActive Attacks, a novel RL-based red-teaming algorithm that adapts its attacks as the victim evolves.
arXiv Detail & Related papers (2025-09-26T06:27:00Z) - Searching for Privacy Risks in LLM Agents via Simulation [61.229785851581504]
We present a search-based framework that alternates between improving attack and defense strategies through the simulation of privacy-critical agent interactions.<n>We find that attack strategies escalate from direct requests to sophisticated tactics, such as impersonation and consent forgery.<n>The discovered attacks and defenses transfer across diverse scenarios and backbone models, demonstrating strong practical utility for building privacy-aware agents.
arXiv Detail & Related papers (2025-08-14T17:49:09Z) - Co-Evolutionary Defence of Active Directory Attack Graphs via GNN-Approximated Dynamic Programming [9.704696173031714]
We model attacker-defender interactions in Active Directory as a Stackelberg game between an adaptive attacker and a proactive defender.<n>We propose a co-evolutionary defense framework that combines Graph Neural Network Approximated Dynamic Programming (GNNDP) to model attacker strategies.<n>Our framework refines attacker and defender policies to improve generalization and prevent premature convergence.
arXiv Detail & Related papers (2025-05-16T21:37:50Z) - Strategize Globally, Adapt Locally: A Multi-Turn Red Teaming Agent with Dual-Level Learning [39.931442440365444]
AlgName is a novel red-teaming agent that emulates sophisticated human attackers through complementary learning dimensions.<n>AlgName enables the agent to identify new jailbreak tactics, develop a goal-based tactic selection framework, and refine prompt formulations for selected tactics.<n> Empirical evaluations on JailbreakBench demonstrate our framework's superior performance, achieving over 90% attack success rates against GPT-3.5-Turbo and Llama-3.1-70B within 5 conversation turns.
arXiv Detail & Related papers (2025-04-02T01:06:19Z) - Learning diverse attacks on large language models for robust red-teaming and safety tuning [126.32539952157083]
Red-teaming, or identifying prompts that elicit harmful responses, is a critical step in ensuring the safe deployment of large language models.<n>We show that even with explicit regularization to favor novelty and diversity, existing approaches suffer from mode collapse or fail to generate effective attacks.<n>We propose to use GFlowNet fine-tuning, followed by a secondary smoothing phase, to train the attacker model to generate diverse and effective attack prompts.
arXiv Detail & Related papers (2024-05-28T19:16:17Z) - LAS-AT: Adversarial Training with Learnable Attack Strategy [82.88724890186094]
"Learnable attack strategy", dubbed LAS-AT, learns to automatically produce attack strategies to improve the model robustness.
Our framework is composed of a target network that uses AEs for training to improve robustness and a strategy network that produces attack strategies to control the AE generation.
arXiv Detail & Related papers (2022-03-13T10:21:26Z) - Robust Federated Learning with Attack-Adaptive Aggregation [45.60981228410952]
Federated learning is vulnerable to various attacks, such as model poisoning and backdoor attacks.
We propose an attack-adaptive aggregation strategy to defend against various attacks for robust learning.
arXiv Detail & Related papers (2021-02-10T04:23:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.