Large Language Model-Based Reward Design for Deep Reinforcement Learning-Driven Autonomous Cyber Defense
- URL: http://arxiv.org/abs/2511.16483v1
- Date: Thu, 20 Nov 2025 15:54:08 GMT
- Title: Large Language Model-Based Reward Design for Deep Reinforcement Learning-Driven Autonomous Cyber Defense
- Authors: Sayak Mukherjee, Samrat Chatterjee, Emilie Purvine, Ted Fujimoto, Tegan Emerson,
- Abstract summary: We propose a large language model (LLM)-based reward design approach to generate autonomous cyber defense policies.<n>Our results suggest that LLM-guided reward designs can lead to effective defense strategies against diverse adversarial behaviors.
- Score: 3.2661946789427314
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Designing rewards for autonomous cyber attack and defense learning agents in a complex, dynamic environment is a challenging task for subject matter experts. We propose a large language model (LLM)-based reward design approach to generate autonomous cyber defense policies in a deep reinforcement learning (DRL)-driven experimental simulation environment. Multiple attack and defense agent personas were crafted, reflecting heterogeneity in agent actions, to generate LLM-guided reward designs where the LLM was first provided with contextual cyber simulation environment information. These reward structures were then utilized within a DRL-driven attack-defense simulation environment to learn an ensemble of cyber defense policies. Our results suggest that LLM-guided reward designs can lead to effective defense strategies against diverse adversarial behaviors.
Related papers
- Adversarial Attack-Defense Co-Evolution for LLM Safety Alignment via Tree-Group Dual-Aware Search and Optimization [51.12422886183246]
Large Language Models (LLMs) have developed rapidly in web services, delivering unprecedented capabilities while amplifying societal risks.<n>Existing works tend to focus on either isolated jailbreak attacks or static defenses, neglecting the dynamic interplay between evolving threats and safeguards in real-world web contexts.<n>We propose ACE-Safety, a novel framework that jointly optimize attack and defense models by seamlessly integrating two key innovative procedures.
arXiv Detail & Related papers (2025-11-24T15:23:41Z) - Grounded in Reality: Learning and Deploying Proactive LLM from Offline Logs [72.08224879435762]
textttLearn-to-Ask is a simulator-free framework for learning and deploying proactive dialogue agents.<n>Our approach culminates in the successful deployment of LLMs into a live, large-scale online AI service.
arXiv Detail & Related papers (2025-10-29T12:08:07Z) - Adversarial Reinforcement Learning for Offensive and Defensive Agents in a Simulated Zero-Sum Network Environment [3.572219661521267]
This paper presents a controlled study of adversarial reinforcement learning in network security through a custom OpenAI Gym environment.<n>The environment captures realistic security trade-offs including background traffic noise, progressive exploitation mechanics, IP-based evasion tactics, honeypot traps, and rate-limiting defenses.
arXiv Detail & Related papers (2025-10-03T05:53:51Z) - Searching for Privacy Risks in LLM Agents via Simulation [61.229785851581504]
We present a search-based framework that alternates between improving attack and defense strategies through the simulation of privacy-critical agent interactions.<n>We find that attack strategies escalate from direct requests to sophisticated tactics, such as impersonation and consent forgery.<n>The discovered attacks and defenses transfer across diverse scenarios and backbone models, demonstrating strong practical utility for building privacy-aware agents.
arXiv Detail & Related papers (2025-08-14T17:49:09Z) - Reinforcement Learning for Decision-Level Interception Prioritization in Drone Swarm Defense [51.736723807086385]
We present a case study demonstrating the practical advantages of reinforcement learning in addressing this challenge.<n>We introduce a high-fidelity simulation environment that captures realistic operational constraints.<n>Agent learns to coordinate multiple effectors for optimal interception prioritization.<n>We evaluate the learned policy against a handcrafted rule-based baseline across hundreds of simulated attack scenarios.
arXiv Detail & Related papers (2025-08-01T13:55:39Z) - MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models [56.09354775405601]
Model extraction attacks aim to replicate the functionality of a black-box model through query access.<n>Most existing defenses presume that attacker queries have out-of-distribution (OOD) samples, enabling them to detect and disrupt suspicious inputs.<n>We propose MISLEADER, a novel defense strategy that does not rely on OOD assumptions.
arXiv Detail & Related papers (2025-06-03T01:37:09Z) - Reinforcement Learning Environment with LLM-Controlled Adversary in D&D 5th Edition Combat [0.0]
This research employs Deep Q-Networks (DQN) for the smaller agents, creating a testbed for strategic AI development.<n>We successfully integrated sophisticated language models into the RL framework, enhancing strategic decision-making processes.
arXiv Detail & Related papers (2025-03-19T22:48:20Z) - Hierarchical Multi-agent Reinforcement Learning for Cyber Network Defense [9.927281246704604]
We propose a hierarchical Proximal Policy Optimization (PPO) architecture that decomposes the cyber defense task into specific sub-tasks like network investigation and host recovery.<n>Our approach involves training sub-policies for each sub-task using PPO enhanced with cybersecurity domain expertise.<n>These sub-policies are then leveraged by a master defense policy that coordinates their selection to solve complex network defense tasks.
arXiv Detail & Related papers (2024-10-22T18:35:05Z) - Attack Prompt Generation for Red Teaming and Defending Large Language
Models [70.157691818224]
Large language models (LLMs) are susceptible to red teaming attacks, which can induce LLMs to generate harmful content.
We propose an integrated approach that combines manual and automatic methods to economically generate high-quality attack prompts.
arXiv Detail & Related papers (2023-10-19T06:15:05Z) - Learning Cyber Defence Tactics from Scratch with Multi-Agent
Reinforcement Learning [4.796742432333795]
Team of intelligent agents in computer network defence roles may reveal promising avenues to safeguard cyber and kinetic assets.
Agents are evaluated on their ability to jointly mitigate attacker activity in host-based defence scenarios.
arXiv Detail & Related papers (2023-08-25T14:07:50Z) - Deep Reinforcement Learning for Cyber System Defense under Dynamic
Adversarial Uncertainties [5.78419291062552]
We propose a data-driven deep reinforcement learning framework to learn proactive, context-aware defense countermeasures.
A dynamic defense optimization problem is formulated with multiple protective postures against different types of adversaries.
arXiv Detail & Related papers (2023-02-03T08:33:33Z) - Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the
Age of AI-NIDS [70.60975663021952]
We study blackbox adversarial attacks on network classifiers.
We argue that attacker-defender fixed points are themselves general-sum games with complex phase transitions.
We show that a continual learning approach is required to study attacker-defender dynamics.
arXiv Detail & Related papers (2021-11-23T23:42:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.