Related papers: Can LLMs effectively provide game-theoretic-based scenarios for cybersecurity?

Can LLMs effectively provide game-theoretic-based scenarios for cybersecurity?

URL: http://arxiv.org/abs/2508.05670v1
Date: Mon, 04 Aug 2025 08:57:14 GMT
Title: Can LLMs effectively provide game-theoretic-based scenarios for cybersecurity?
Authors: Daniele Proverbio, Alessio Buscemi, Alessandro Di Stefano, The Anh Han, German Castignani, Pietro Liò,
Abstract summary: Large Language Models (LLMs) offer new tools and challenges for the security of computer systems.<n>We investigate whether classical game-theoretic frameworks can effectively capture the behaviours of LLM-driven actors and bots.
Score: 51.96049148869987
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Game theory has long served as a foundational tool in cybersecurity to test, predict, and design strategic interactions between attackers and defenders. The recent advent of Large Language Models (LLMs) offers new tools and challenges for the security of computer systems; In this work, we investigate whether classical game-theoretic frameworks can effectively capture the behaviours of LLM-driven actors and bots. Using a reproducible framework for game-theoretic LLM agents, we investigate two canonical scenarios -- the one-shot zero-sum game and the dynamic Prisoner's Dilemma -- and we test whether LLMs converge to expected outcomes or exhibit deviations due to embedded biases. Our experiments involve four state-of-the-art LLMs and span five natural languages, English, French, Arabic, Vietnamese, and Mandarin Chinese, to assess linguistic sensitivity. For both games, we observe that the final payoffs are influenced by agents characteristics such as personality traits or knowledge of repeated rounds. Moreover, we uncover an unexpected sensitivity of the final payoffs to the choice of languages, which should warn against indiscriminate application of LLMs in cybersecurity applications and call for in-depth studies, as LLMs may behave differently when deployed in different countries. We also employ quantitative metrics to evaluate the internal consistency and cross-language stability of LLM agents, to help guide the selection of the most stable LLMs and optimising models for secure applications.

Related papers

Bridging the Knowledge Void: Inference-time Acquisition of Unfamiliar Programming Languages for Coding Tasks [22.908904483320953]
Large Language Models (LLMs) in coding tasks are often a reflection of their extensive pre-training corpora.<n>We propose ILA-agent, a general ILA framework that equips LLMs with a set of behavioral primitives.<n>We instantiate ILA-agent for Cangjie and evaluate its performance across code generation, translation, and program repair tasks.
arXiv Detail & Related papers (2026-01-16T09:06:47Z)
Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space [57.868527884634894]
Natural Language Actor-Critic is a novel actor-critic algorithm that trains policies using natural language rather than scalar values.<n>We present results on a mixture of reasoning, web browsing, and tool-use with dialogue tasks, demonstrating that NLAC shows promise in outperforming existing training approaches.
arXiv Detail & Related papers (2025-12-04T09:21:44Z)
From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future [15.568939568441317]
We investigate the current practice and solutions for large language models (LLMs) and LLM-based agents for software engineering.<n>In particular we summarise six key topics: requirement engineering, code generation, autonomous decision-making, software design, test generation, and software maintenance.<n>We discuss the models and benchmarks used, providing a comprehensive analysis of their applications and effectiveness in software engineering.
arXiv Detail & Related papers (2024-08-05T14:01:15Z)
Defending Against Social Engineering Attacks in the Age of LLMs [19.364994678178036]
Large Language Models (LLMs) can emulate human conversational patterns and facilitate chat-based social engineering (CSE) attacks. This study investigates the dual capabilities of LLMs as both facilitators and defenders against CSE threats. We propose ConvoSentinel, a modular defense pipeline that improves detection at both the message and the conversation levels.
arXiv Detail & Related papers (2024-06-18T04:39:40Z)
Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey [46.19229410404056]
Large language models (LLMs) have made remarkable advancements in natural language processing. These models are trained on vast datasets to exhibit powerful language understanding and generation capabilities. Privacy and security issues have been revealed throughout their life cycle.
arXiv Detail & Related papers (2024-06-12T07:55:32Z)
Ollabench: Evaluating LLMs' Reasoning for Human-centric Interdependent Cybersecurity [0.0]
Large Language Models (LLMs) have the potential to enhance Agent-Based Modeling by better representing complex interdependent cybersecurity systems. Existing evaluation frameworks often overlook the human factor and cognitive computing capabilities essential for interdependent cybersecurity. I propose OllaBench, a novel evaluation framework that assesses LLMs' accuracy, wastefulness, and consistency in answering scenario-based information security compliance and non-compliance questions.
arXiv Detail & Related papers (2024-06-11T00:35:39Z)
MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration [98.18244218156492]
Large Language Models (LLMs) have significantly advanced natural language processing.<n>As their applications expand into multi-agent environments, there arises a need for a comprehensive evaluation framework.<n>This work introduces a novel competition-based benchmark framework to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z)
Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models [105.39236338147715]
The paper is inspired by the popular language game Who is Spy'' We develop DEEP to evaluate LLMs' expression and disguising abilities. We then introduce SpyGame, an interactive multi-agent framework.
arXiv Detail & Related papers (2023-10-31T14:37:42Z)
LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks [0.0]
We explore the intersection of Language Models (LLMs) and penetration testing.<n>We introduce a fully automated privilege-escalation tool for evaluating the efficacy of LLMs for (ethical) hacking.<n>We analyze the impact of different context sizes, in-context learning, optional high-level mechanisms, and memory management techniques.
arXiv Detail & Related papers (2023-10-17T17:15:41Z)
Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools. Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions. Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z)
Red Teaming Language Model Detectors with Language Models [114.36392560711022]
Large language models (LLMs) present significant safety and ethical risks if exploited by malicious users. Recent works have proposed algorithms to detect LLM-generated text and protect LLMs. We study two types of attack strategies: 1) replacing certain words in an LLM's output with their synonyms given the context; 2) automatically searching for an instructional prompt to alter the writing style of the generation.
arXiv Detail & Related papers (2023-05-31T10:08:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.