Related papers: Proactively Detecting Threats: A Novel Approach Using LLMs

Proactively Detecting Threats: A Novel Approach Using LLMs

URL: http://arxiv.org/abs/2601.09029v1
Date: Tue, 13 Jan 2026 23:28:33 GMT
Title: Proactively Detecting Threats: A Novel Approach Using LLMs
Authors: Aniesh Chawla, Udbhav Prasad,
Abstract summary: This paper presents the first systematic evaluation of large language models (LLMs) to proactively identify indicators of compromise (IOCs)<n>We developed an automated system that pulls IOCs from 15 web-based threat report sources to evaluate six LLM models.<n>Gemini 1.5 Pro achieved 0.958 precision and 0.788 specificity for malicious IOC identification, while demonstrating perfect recall (1.0) for actual threats.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Enterprise security faces escalating threats from sophisticated malware, compounded by expanding digital operations. This paper presents the first systematic evaluation of large language models (LLMs) to proactively identify indicators of compromise (IOCs) from unstructured web-based threat intelligence sources, distinguishing it from reactive malware detection approaches. We developed an automated system that pulls IOCs from 15 web-based threat report sources to evaluate six LLM models (Gemini, Qwen, and Llama variants). Our evaluation of 479 webpages containing 2,658 IOCs (711 IPv4 addresses, 502 IPv6 addresses, 1,445 domains) reveals significant performance variations. Gemini 1.5 Pro achieved 0.958 precision and 0.788 specificity for malicious IOC identification, while demonstrating perfect recall (1.0) for actual threats.

Related papers

MI$^2$DAS: A Multi-Layer Intrusion Detection Framework with Incremental Learning for Securing Industrial IoT Networks [47.386868423451595]
MI$2$DAS is a multi-layer intrusion detection framework that integrates anomaly-based hierarchical traffic pooling and open-set recognition.<n>Experiments conducted on the Edge-IIoTset dataset demonstrate strong performance across all layers.<n>These results showcase MI$2$DAS as an effective, scalable and adaptive framework for enhancing IIoT security.
arXiv Detail & Related papers (2026-02-27T09:37:05Z)
Benchmarking Large Language Models for Zero-shot and Few-shot Phishing URL Detection [0.0]
Deceptive URLs have reached unprecedented sophistication due to the widespread use of generative AI by cybercriminals.<n> phishing volume has escalated over 4,000% since 2022, with nearly 50% more attacks evading detection.<n>We present a benchmark of LLMs under a unified zero-shot and few-shot prompting framework.
arXiv Detail & Related papers (2026-02-02T18:56:06Z)
Jailbreak Mimicry: Automated Discovery of Narrative-Based Jailbreaks for Large Language Models [0.0]
Large language models (LLMs) remain vulnerable to sophisticated prompt engineering attacks.<n>We introduce Jailbreak Mimicry, a systematic methodology for training compact attacker models to automatically generate narrative-based jailbreak prompts.<n>Our approach transforms adversarial prompt discovery from manual craftsmanship into a reproducible scientific process.
arXiv Detail & Related papers (2025-10-24T23:53:16Z)
ParaVul: A Parallel Large Language Model and Retrieval-Augmented Framework for Smart Contract Vulnerability Detection [43.41293570032631]
ParaVul is a retrieval-augmented framework to improve the reliability and accuracy of smart contract vulnerability detection.<n>We develop Sparse Low-Rank Adaptation (SLoRA) for LLM fine-tuning.<n>We construct a vulnerability contract dataset and develop a hybrid Retrieval-Augmented Generation (RAG) system.
arXiv Detail & Related papers (2025-10-20T03:23:41Z)
DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models [50.21378052667732]
We conduct an in-depth analysis of dLLM vulnerabilities to jailbreak attacks across two distinct dimensions: intra-step and inter-step dynamics.<n>We propose DiffuGuard, a training-free defense framework that addresses vulnerabilities through a dual-stage approach.
arXiv Detail & Related papers (2025-09-29T05:17:10Z)
VulAgent: Hypothesis-Validation based Multi-Agent Vulnerability Detection [55.957275374847484]
VulAgent is a multi-agent vulnerability detection framework based on hypothesis validation.<n>It implements a semantics-sensitive, multi-view detection pipeline, each aligned to a specific analysis perspective.<n>On average, VulAgent improves overall accuracy by 6.6%, increases the correct identification rate of vulnerable--fixed code pairs by up to 450%, and reduces the false positive rate by about 36%.
arXiv Detail & Related papers (2025-09-15T02:25:38Z)
When Developer Aid Becomes Security Debt: A Systematic Analysis of Insecure Behaviors in LLM Coding Agents [1.7587442088965226]
LLM-based coding agents are rapidly being deployed in software development, yet their safety implications remain poorly understood.<n>We conducted the first systematic safety evaluation of autonomous coding agents, analyzing over 12,000 actions across five state-of-the-art models.<n>We developed a high-precision detection system that identified four major vulnerability categories, with information exposure the most prevalent.
arXiv Detail & Related papers (2025-07-12T16:11:07Z)
The Dark Side of LLMs: Agent-based Attacks for Complete Computer Takeover [0.0]
Large Language Model (LLM) agents and multi-agent systems introduce security vulnerabilities that extend beyond traditional content generation to system-level compromises.<n>This paper presents a comprehensive evaluation of the LLMs security used as reasoning engines within autonomous agents.<n>We show how different attack surfaces and trust boundaries can be leveraged to orchestrate such takeovers.
arXiv Detail & Related papers (2025-07-09T13:54:58Z)
Bridging AI and Software Security: A Comparative Vulnerability Assessment of LLM Agent Deployment Paradigms [1.03121181235382]
Large Language Model (LLM) agents face security vulnerabilities spanning AI-specific and traditional software domains.<n>This study bridges this gap through comparative evaluation of Function Calling architecture and Model Context Protocol (MCP) deployment paradigms.<n>We tested 3,250 attack scenarios across seven language models, evaluating simple, composed, and chained attacks targeting both AI-specific threats and software vulnerabilities.
arXiv Detail & Related papers (2025-07-08T18:24:28Z)
CyberGym: Evaluating AI Agents' Real-World Cybersecurity Capabilities at Scale [45.97598662617568]
We introduce CyberGym, a large-scale benchmark featuring 1,507 real-world vulnerabilities across 188 software projects.<n>We show that CyberGym leads to the discovery of 35 zero-day vulnerabilities and 17 historically incomplete patches.<n>These results underscore that CyberGym is not only a robust benchmark for measuring AI's progress in cybersecurity but also a platform for creating direct, real-world security impact.
arXiv Detail & Related papers (2025-06-03T07:35:14Z)
AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents [54.29555239363013]
We propose a generic black-box fuzzing framework, AgentVigil, to automatically discover and exploit indirect prompt injection vulnerabilities.<n>We evaluate AgentVigil on two public benchmarks, AgentDojo and VWA-adv, where it achieves 71% and 70% success rates against agents based on o3-mini and GPT-4o.<n>We apply our attacks in real-world environments, successfully misleading agents to navigate to arbitrary URLs, including malicious sites.
arXiv Detail & Related papers (2025-05-09T07:40:17Z)
EXPLICATE: Enhancing Phishing Detection through Explainable AI and LLM-Powered Interpretability [44.2907457629342]
EXPLICATE is a framework that enhances phishing detection through a three-component architecture.<n>It is on par with existing deep learning techniques but has better explainability.<n>It addresses the critical divide between automated AI and user trust in phishing detection systems.
arXiv Detail & Related papers (2025-03-22T23:37:35Z)
LLMCloudHunter: Harnessing LLMs for Automated Extraction of Detection Rules from Cloud-Based CTI [24.312198733476063]
Open-source cyber threat intelligence (OS-CTI) is a valuable resource for threat hunters. Previous studies aimed at automating OSCTI analysis failed to provide actionable outputs. We propose LLMCloudHunter, a novel framework that automatically generates generic-signature detection rule candidates from OSCTI data.
arXiv Detail & Related papers (2024-07-06T21:43:35Z)
DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified Robustness [58.23214712926585]
We develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection. Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables. We are the first to offer certified robustness in the realm of static detection of malware executables.
arXiv Detail & Related papers (2023-03-20T17:25:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.