The Procedural Semantics Gap in Structured CTI: A Measurement-Driven STIX Analysis for APT Emulation
- URL: http://arxiv.org/abs/2512.12078v1
- Date: Fri, 12 Dec 2025 22:53:52 GMT
- Title: The Procedural Semantics Gap in Structured CTI: A Measurement-Driven STIX Analysis for APT Emulation
- Authors: Ágney Lopes Roth Ferraz, Sidnei Barbieri, Murray Evangelista de Souza, Lourenço Alves Pereira Júnior,
- Abstract summary: Cyber threat intelligence (CTI) encoded in STIX and structured according to the MITRE ATT&CK framework has become a global reference for describing adversary behavior.<n>We ask whether its structured artifacts contain sufficient behavioral detail to support multi-stage adversary emulation.
- Score: 0.5399800035598185
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cyber threat intelligence (CTI) encoded in STIX and structured according to the MITRE ATT&CK framework has become a global reference for describing adversary behavior. However, ATT&CK was designed as a descriptive knowledge base rather than a procedural model. We therefore ask whether its structured artifacts contain sufficient behavioral detail to support multi-stage adversary emulation. Through systematic measurements of the ATT&CK Enterprise bundle, we show that campaign objects encode just fragmented slices of behavior. Only 35.6% of techniques appear in at least one campaign, and neither clustering nor sequence analysis reveals any reusable behavioral structure under technique overlap or LCS-based analyses. Intrusion sets cover a broader portion of the technique space, yet omit the procedural semantics required to transform behavioral knowledge into executable chains, including ordering, preconditions, and environmental assumptions. These findings reveal a procedural semantic gap in current CTI standards: they describe what adversaries do, but not exactly how that behavior was operationalized. To assess how far this gap can be bridged in practice, we introduce a three-stage methodology that translates behavioral information from structured CTI into executable steps and makes the necessary environmental assumptions explicit. We demonstrate its viability by instantiating the resulting steps as operations in the MITRE Caldera framework. Case studies of ShadowRay and Soft Cell show that structured CTI can enable the emulation of multi-stage APT campaigns, but only when analyst-supplied parameters and assumptions are explicitly recorded. This, in turn, exposes the precise points at which current standards fail to support automation. Our results clarify the boundary between descriptive and machine-actionable CTI for adversary emulation.
Related papers
- On the Paradoxical Interference between Instruction-Following and Task Solving [50.75960598434753]
Instruction following aims to align Large Language Models (LLMs) with human intent by specifying explicit constraints on how tasks should be performed.<n>We reveal a counterintuitive phenomenon: instruction following can paradoxically interfere with LLMs' task-solving capability.<n>We propose a metric, SUSTAINSCORE, to quantify the interference of instruction following with task solving.
arXiv Detail & Related papers (2026-01-29T17:48:56Z) - Constructing Multi-label Hierarchical Classification Models for MITRE ATT&CK Text Tagging [0.0]
We provide a "task space" characterization of the MITRE ATT&CK text tagging task.<n>We construct our own multi-label hierarchical classification models for the text tagging task.<n>Our models meet or surpass state-of-the-art performance while relying only on classical machine learning methods.
arXiv Detail & Related papers (2026-01-21T00:41:34Z) - S-DAPT-2026: A Stage-Aware Synthetic Dataset for Advanced Persistent Threat Detection [0.0538441598991272]
This paper presents a near realistic synthetic APT dataset and an efficient alert correlation framework.<n>The proposed approach introduces a machine learning based correlation module that employs K Nearest Neighbors (KNN) clustering with a cosine similarity metric to group semantically related alerts.<n>A comprehensive statistical characterization of the dataset is provided to facilitate aware and support APT stage predictions.
arXiv Detail & Related papers (2026-01-10T21:25:41Z) - Cracking IoT Security: Can LLMs Outsmart Static Analysis Tools? [1.8549313085249322]
This work presents the first comprehensive evaluation of Large Language Models (LLMs) across a multi-category interaction threat taxonomy.<n>We benchmark Llama 3.1 8B, Llama 70B, GPT-4o, Gemini-2.5-Pro, and DeepSeek-R1 across zero-, one-, and two-shot settings.<n>Our findings show that while LLMs exhibit promising semantic understanding, their accuracy degrades significantly for threats requiring cross-rule structural reasoning.
arXiv Detail & Related papers (2026-01-02T04:17:36Z) - From Retrieval to Reasoning: A Framework for Cyber Threat Intelligence NER with Explicit and Adaptive Instructions [15.710492251334792]
TTPrompt is a framework shifting from implicit induction to explicit instruction.<n> FIR enables LLMs to self-refine guidelines by learning from errors on minimal labeled data.<n>With refinement on just 1% of training data, TTPrompt rivals models fine-tuned on the full dataset.
arXiv Detail & Related papers (2025-12-22T14:13:01Z) - CTIArena: Benchmarking LLM Knowledge and Reasoning Across Heterogeneous Cyber Threat Intelligence [48.63397742510097]
Cyber threat intelligence (CTI) is central to modern cybersecurity, providing critical insights for detecting and mitigating evolving threats.<n>With the natural language understanding and reasoning capabilities of large language models (LLMs), there is increasing interest in applying them to CTI.<n>We present CTIArena, the first benchmark for evaluating LLM performance on heterogeneous, multi-source CTI.
arXiv Detail & Related papers (2025-10-13T22:10:17Z) - Rethinking Testing for LLM Applications: Characteristics, Challenges, and a Lightweight Interaction Protocol [83.83217247686402]
Large Language Models (LLMs) have evolved from simple text generators into complex software systems that integrate retrieval augmentation, tool invocation, and multi-turn interactions.<n>Their inherent non-determinism, dynamism, and context dependence pose fundamental challenges for quality assurance.<n>This paper decomposes LLM applications into a three-layer architecture: textbftextitSystem Shell Layer, textbftextitPrompt Orchestration Layer, and textbftextitLLM Inference Core.
arXiv Detail & Related papers (2025-08-28T13:00:28Z) - Servant, Stalker, Predator: How An Honest, Helpful, And Harmless (3H) Agent Unlocks Adversarial Skills [3.0620527758972496]
This paper identifies and analyzes a novel vulnerability class in Model Context Protocol based agent systems.<n>The attack chain describes and demonstrates how benign, individually authorized tasks can be orchestrated to produce harmful emergent behaviors.
arXiv Detail & Related papers (2025-08-27T01:11:59Z) - AttackSeqBench: Benchmarking Large Language Models in Analyzing Attack Sequences within Cyber Threat Intelligence [17.234214109636113]
Cyber Threat Intelligence (CTI) reports document observations of cyber threats, synthesizing evidence about adversaries' actions and intent into actionable knowledge.<n>The unstructured and verbose nature of CTI reports poses significant challenges for security practitioners to manually extract and analyze such sequences.<n>Although large language models (LLMs) exhibit promise in cybersecurity tasks such as entity extraction and knowledge graph construction, their understanding and reasoning capabilities towards behavioral sequences remains underexplored.
arXiv Detail & Related papers (2025-03-05T04:25:21Z) - Weakly Supervised Co-training with Swapping Assignments for Semantic Segmentation [21.345548821276097]
Class activation maps (CAMs) are commonly employed in weakly supervised semantic segmentation (WSSS) to produce pseudo-labels.
We propose an end-to-end WSSS model incorporating guided CAMs, wherein our segmentation model is trained while concurrently optimizing CAMs online.
CoSA is the first single-stage approach to outperform all existing multi-stage methods including those with additional supervision.
arXiv Detail & Related papers (2024-02-27T21:08:23Z) - Unsupervised Continual Anomaly Detection with Contrastively-learned
Prompt [80.43623986759691]
We introduce a novel Unsupervised Continual Anomaly Detection framework called UCAD.
The framework equips the UAD with continual learning capability through contrastively-learned prompts.
We conduct comprehensive experiments and set the benchmark on unsupervised continual anomaly detection and segmentation.
arXiv Detail & Related papers (2024-01-02T03:37:11Z) - CSL: Class-Agnostic Structure-Constrained Learning for Segmentation
Including the Unseen [62.72636247006293]
Class-Agnostic Structure-Constrained Learning is a plug-in framework that can integrate with existing methods.
We propose soft assignment and mask split methodologies that enhance OOD object segmentation.
Empirical evaluations demonstrate CSL's prowess in boosting the performance of existing algorithms spanning OOD segmentation, ZS3, and DA segmentation.
arXiv Detail & Related papers (2023-12-09T11:06:18Z) - Temporal Action Localization with Enhanced Instant Discriminability [66.76095239972094]
Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video.
We propose a one-stage framework named TriDet to resolve imprecise predictions of action boundaries by existing methods.
Experimental results demonstrate the robustness of TriDet and its state-of-the-art performance on multiple TAD datasets.
arXiv Detail & Related papers (2023-09-11T16:17:50Z) - Model-Agnostic Few-Shot Open-Set Recognition [36.97433312193586]
We tackle the Few-Shot Open-Set Recognition (FSOSR) problem.
We focus on developing model-agnostic inference methods that can be plugged into any existing model.
We introduce an Open Set Transductive Information Maximization method OSTIM.
arXiv Detail & Related papers (2022-06-18T16:27:59Z) - FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality
Assessment [93.09267863425492]
We argue that understanding both high-level semantics and internal temporal structures of actions in competitive sports videos is the key to making predictions accurate and interpretable.
We construct a new fine-grained dataset, called FineDiving, developed on diverse diving events with detailed annotations on action procedures.
arXiv Detail & Related papers (2022-04-07T17:59:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.