Related papers: SHIELD: An Auto-Healing Agentic Defense Framework for LLM Resource Exhaustion Attacks

SHIELD: An Auto-Healing Agentic Defense Framework for LLM Resource Exhaustion Attacks

URL: http://arxiv.org/abs/2601.19174v1
Date: Tue, 27 Jan 2026 04:03:15 GMT
Title: SHIELD: An Auto-Healing Agentic Defense Framework for LLM Resource Exhaustion Attacks
Authors: Nirhoshan Sivaroopan, Kanchana Thilakarathna, Albert Zomaya, Manu, Yi Guo, Jo Plested, Tim Lynar, Jack Yang, Wangli Yang,
Abstract summary: Existing defenses either rely on statistical filters that fail on semantically meaningful attacks or use static LLM-based detectors that struggle to adapt as attack strategies evolve.<n>We introduce SHIELD, a multi-agent, auto-healing defense framework centered on a three-stage Defense Agent.<n>Two auxiliary agents, a Knowledge Updating Agent and a Prompt Optimization Agent, form a closed self-healing loop, when an attack bypasses detection.
Score: 5.779141020370452
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sponge attacks increasingly threaten LLM systems by inducing excessive computation and DoS. Existing defenses either rely on statistical filters that fail on semantically meaningful attacks or use static LLM-based detectors that struggle to adapt as attack strategies evolve. We introduce SHIELD, a multi-agent, auto-healing defense framework centered on a three-stage Defense Agent that integrates semantic similarity retrieval, pattern matching, and LLM-based reasoning. Two auxiliary agents, a Knowledge Updating Agent and a Prompt Optimization Agent, form a closed self-healing loop, when an attack bypasses detection, the system updates an evolving knowledgebase, and refines defense instructions. Extensive experiments show that SHIELD consistently outperforms perplexity-based and standalone LLM defenses, achieving high F1 scores across both non-semantic and semantic sponge attacks, demonstrating the effectiveness of agentic self-healing against evolving resource-exhaustion threats.

Related papers

MAGIC: A Co-Evolving Attacker-Defender Adversarial Game for Robust LLM Safety [28.246225272659917]
This paper introduces textbfMAGIC, a novel multi-turn multi-agent reinforcement learning framework.<n>It formulates Large Language Models safety alignment as an adversarial asymmetric game.<n>Our framework demonstrates superior defense success rates without compromising the helpfulness of the model.
arXiv Detail & Related papers (2026-02-02T02:12:28Z)
AegisAgent: An Autonomous Defense Agent Against Prompt Injection Attacks in LLM-HARs [22.974148993147967]
AegisAgent is an autonomous agent system designed to ensure the security of LLM-driven HAR systems.<n>Results show it reduces attack success rate by 30% on average while incurring only 78.6 ms of latency overhead on a GPU workstation.
arXiv Detail & Related papers (2025-12-24T06:29:24Z)
Adversarial Attack-Defense Co-Evolution for LLM Safety Alignment via Tree-Group Dual-Aware Search and Optimization [51.12422886183246]
Large Language Models (LLMs) have developed rapidly in web services, delivering unprecedented capabilities while amplifying societal risks.<n>Existing works tend to focus on either isolated jailbreak attacks or static defenses, neglecting the dynamic interplay between evolving threats and safeguards in real-world web contexts.<n>We propose ACE-Safety, a novel framework that jointly optimize attack and defense models by seamlessly integrating two key innovative procedures.
arXiv Detail & Related papers (2025-11-24T15:23:41Z)
Adversarial Reinforcement Learning for Large Language Model Agent Safety [20.704989548285372]
Large Language Model (LLM) agents can leverage tools like Google Search to complete complex tasks.<n>Current defense strategies rely on fine-tuning LLM agents on datasets of known attacks.<n>We propose Adversarial Reinforcement Learning for Agent Safety (ARLAS), a novel framework that leverages adversarial reinforcement learning (RL) by formulating the problem as a two-player zero-sum game.
arXiv Detail & Related papers (2025-10-06T23:09:18Z)
AdvEvo-MARL: Shaping Internalized Safety through Adversarial Co-Evolution in Multi-Agent Reinforcement Learning [78.5751183537704]
AdvEvo-MARL is a co-evolutionary multi-agent reinforcement learning framework that internalizes safety into task agents.<n>Rather than relying on external guards, AdvEvo-MARL jointly optimize attackers and defenders.
arXiv Detail & Related papers (2025-10-02T02:06:30Z)
Searching for Privacy Risks in LLM Agents via Simulation [61.229785851581504]
We present a search-based framework that alternates between improving attack and defense strategies through the simulation of privacy-critical agent interactions.<n>We find that attack strategies escalate from direct requests to sophisticated tactics, such as impersonation and consent forgery.<n>The discovered attacks and defenses transfer across diverse scenarios and backbone models, demonstrating strong practical utility for building privacy-aware agents.
arXiv Detail & Related papers (2025-08-14T17:49:09Z)
BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks [58.959622170433725]
BlindGuard is an unsupervised defense method that learns without requiring any attack-specific labels or prior knowledge of malicious behaviors.<n>We show that BlindGuard effectively detects diverse attack types (i.e., prompt injection, memory poisoning, and tool attack) across multi-agent systems.
arXiv Detail & Related papers (2025-08-11T16:04:47Z)
AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents [54.29555239363013]
We propose a generic black-box fuzzing framework, AgentVigil, to automatically discover and exploit indirect prompt injection vulnerabilities.<n>We evaluate AgentVigil on two public benchmarks, AgentDojo and VWA-adv, where it achieves 71% and 70% success rates against agents based on o3-mini and GPT-4o.<n>We apply our attacks in real-world environments, successfully misleading agents to navigate to arbitrary URLs, including malicious sites.
arXiv Detail & Related papers (2025-05-09T07:40:17Z)
AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security [74.22452069013289]
AegisLLM is a cooperative multi-agent defense against adversarial attacks and information leakage.<n>We show that scaling agentic reasoning system at test-time substantially enhances robustness without compromising model utility.<n> Comprehensive evaluations across key threat scenarios, including unlearning and jailbreaking, demonstrate the effectiveness of AegisLLM.
arXiv Detail & Related papers (2025-04-29T17:36:05Z)
Targeting the Core: A Simple and Effective Method to Attack RAG-based Agents via Direct LLM Manipulation [4.241100280846233]
AI agents, powered by large language models (LLMs), have transformed human-computer interactions by enabling seamless, natural, and context-aware communication.<n>This paper investigates a critical vulnerability: adversarial attacks targeting the LLM core within AI agents.
arXiv Detail & Related papers (2024-12-05T18:38:30Z)
Efficient Adversarial Training in LLMs with Continuous Attacks [99.5882845458567]
Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails. We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses. C-AdvIPO is an adversarial variant of IPO that does not require utility data for adversarially robust alignment.
arXiv Detail & Related papers (2024-05-24T14:20:09Z)
Large Language Model Sentinel: LLM Agent for Adversarial Purification [27.742161175314635]
Large language models (LLMs) are vulnerable to adversarial attacks by some well-designed textual perturbations.<n>We introduce a novel defense technique named Large LAnguage MOdel Sentinel (LLAMOS) to enhance the adversarial robustness of LLMs.
arXiv Detail & Related papers (2024-05-24T07:23:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.