Evaluating LLM Generated Detection Rules in Cybersecurity
- URL: http://arxiv.org/abs/2509.16749v1
- Date: Sat, 20 Sep 2025 17:21:51 GMT
- Title: Evaluating LLM Generated Detection Rules in Cybersecurity
- Authors: Anna Bertiger, Bobby Filar, Aryan Luthra, Stefano Meschiari, Aiden Mitchell, Sam Scholten, Vivek Sharath,
- Abstract summary: The benchmark employs a holdout set-based methodology to measure the effectiveness of LLM-generated security rules.<n>It provides three key metrics inspired by the way experts evaluate security rules.<n>This methodology is illustrated using rules from Sublime Security's detection team and those written by Sublime Security's Automated Detection Engineer.
- Score: 0.3469154896502103
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: LLMs are increasingly pervasive in the security environment, with limited measures of their effectiveness, which limits trust and usefulness to security practitioners. Here, we present an open-source evaluation framework and benchmark metrics for evaluating LLM-generated cybersecurity rules. The benchmark employs a holdout set-based methodology to measure the effectiveness of LLM-generated security rules in comparison to a human-generated corpus of rules. It provides three key metrics inspired by the way experts evaluate security rules, offering a realistic, multifaceted evaluation of the effectiveness of an LLM-based security rule generator. This methodology is illustrated using rules from Sublime Security's detection team and those written by Sublime Security's Automated Detection Engineer (ADE), with a thorough analysis of ADE's skills presented in the results section.
Related papers
- From LLMs to Agents: A Comparative Evaluation of LLMs and LLM-based Agents in Security Patch Detection [42.089851895083804]
Large language models (LLMs) and LLM-based agents have demonstrated remarkable capabilities in various software engineering tasks.<n>We conduct a comprehensive evaluation of the performance of LLMs and LLM-based agents for security patch detection.<n>Our findings reveal that the Data-Aug LLM achieves the best overall performance, whereas the ReAct Agent demonstrates the lowest false positive rate (FPR)
arXiv Detail & Related papers (2025-11-11T09:58:41Z) - SENTINEL: A Multi-Level Formal Framework for Safety Evaluation of LLM-based Embodied Agents [25.567593463613388]
We present Sentinel, the first framework for formally evaluating the physical safety of Large Language Model(LLM)-based embodied agents.<n>We apply Sentinel in VirtualHome and ALFRED, and formally evaluate multiple LLM-based embodied agents against diverse safety requirements.
arXiv Detail & Related papers (2025-10-14T20:53:51Z) - The Scales of Justitia: A Comprehensive Survey on Safety Evaluation of LLMs [57.1838332916627]
Large Language Models (LLMs) have shown remarkable capabilities in Natural Language Processing (NLP)<n>Their widespread deployment has also raised significant safety concerns.<n>LLMs-generated content can exhibit unsafe behaviors such as toxicity, bias, or misinformation, especially in adversarial contexts.
arXiv Detail & Related papers (2025-06-06T05:50:50Z) - Evaluating LLM Agent Adherence to Hierarchical Safety Principles: A Lightweight Benchmark for Probing Foundational Controllability Components [0.0]
This paper introduces a lightweight, interpretable benchmark to evaluate an agent's ability to uphold a high-level safety principle.<n>Our evaluation reveals two primary findings: (1) a quantifiable "cost of compliance" where safety constraints degrade task performance even when compliant solutions exist, and (2) an "illusion of compliance" where high adherence often masks task incompetence rather than choice.
arXiv Detail & Related papers (2025-06-03T01:16:34Z) - AgentAuditor: Human-Level Safety and Security Evaluation for LLM Agents [41.000042817113645]
sys is a universal, training-free, memory-augmented reasoning framework.<n>sys constructs an experiential memory by having an LLM adaptively extract structured semantic features.<n>data is the first benchmark designed to check how well LLM-based evaluators can spot both safety risks and security threats.
arXiv Detail & Related papers (2025-05-31T17:10:23Z) - LLM Agents Should Employ Security Principles [60.03651084139836]
This paper argues that the well-established design principles in information security should be employed when deploying Large Language Model (LLM) agents at scale.<n>We introduce AgentSandbox, a conceptual framework embedding these security principles to provide safeguards throughout an agent's life-cycle.
arXiv Detail & Related papers (2025-05-29T21:39:08Z) - SandboxEval: Towards Securing Test Environment for Untrusted Code [2.603958690885184]
This work focuses on evaluating the security and confidentiality properties of test environments.<n>We introduce SandboxEval, a test suite featuring manually crafted test cases that simulate real-world safety scenarios.<n>We show, first, that the test suite accurately describes limitations placed on an LLM operating under instructions to generate malicious code.
arXiv Detail & Related papers (2025-03-27T19:56:00Z) - Value Compass Benchmarks: A Platform for Fundamental and Validated Evaluation of LLMs Values [76.70893269183684]
Large Language Models (LLMs) achieve remarkable breakthroughs.<n> aligning their values with humans has become imperative for their responsible development.<n>There still lack evaluations of LLMs values that fulfill three desirable goals.
arXiv Detail & Related papers (2025-01-13T05:53:56Z) - SelfPrompt: Autonomously Evaluating LLM Robustness via Domain-Constrained Knowledge Guidelines and Refined Adversarial Prompts [0.6291443816903801]
This paper introduces a novel framework designed to autonomously evaluate the robustness of large language models (LLMs)<n>Our method generates descriptive sentences from domain-constrained knowledge graph triplets to formulate adversarial prompts.<n>This self-evaluation mechanism allows the LLM to evaluate its robustness without the need for external benchmarks.
arXiv Detail & Related papers (2024-12-01T10:58:53Z) - Uncovering Safety Risks of Large Language Models through Concept Activation Vector [13.804245297233454]
We introduce a Safety Concept Activation Vector (SCAV) framework to guide attacks on large language models (LLMs)<n>We then develop an SCAV-guided attack method that can generate both attack prompts and embedding-level attacks.<n>Our attack method significantly improves the attack success rate and response quality while requiring less training data.
arXiv Detail & Related papers (2024-04-18T09:46:25Z) - Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress.
Our investigation exposes a critical oversight in this belief.
By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z) - Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward [9.218557081971708]
Large Language Models (LLMs) have seen widespread applications across numerous fields.
Their limited interpretability poses concerns regarding their safe operations from multiple aspects.
Recent research has started developing quality assurance methods for LLMs.
arXiv Detail & Related papers (2024-04-12T14:55:16Z) - ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming [64.86326523181553]
ALERT is a large-scale benchmark to assess safety based on a novel fine-grained risk taxonomy.
It aims to identify vulnerabilities, inform improvements, and enhance the overall safety of the language models.
arXiv Detail & Related papers (2024-04-06T15:01:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.