Instantiating Standards: Enabling Standard-Driven Text TTP Extraction with Evolvable Memory
- URL: http://arxiv.org/abs/2505.09261v1
- Date: Wed, 14 May 2025 10:22:13 GMT
- Title: Instantiating Standards: Enabling Standard-Driven Text TTP Extraction with Evolvable Memory
- Authors: Cheng Meng, ZhengWei Jiang, QiuYun Wang, XinYi Li, ChunYan Ma, FangMing Dong, FangLi Ren, BaoXu Liu,
- Abstract summary: We introduce a novel framework that converts abstract standard definitions into actionable, contextualized knowledge.<n>Our method utilizes Large Language Model (LLM) to generate, update, and apply this knowledge.<n> Experiments show our framework boosts Technique F1 scores by 11% over GPT-4o.
- Score: 4.909107168534244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Extracting MITRE ATT\&CK Tactics, Techniques, and Procedures (TTPs) from natural language threat reports is crucial yet challenging. Existing methods primarily focus on performance metrics using data-driven approaches, often neglecting mechanisms to ensure faithful adherence to the official standard. This deficiency compromises reliability and consistency of TTP assignments, creating intelligence silos and contradictory threat assessments across organizations. To address this, we introduce a novel framework that converts abstract standard definitions into actionable, contextualized knowledge. Our method utilizes Large Language Model (LLM) to generate, update, and apply this knowledge. This framework populates an evolvable memory with dual-layer situational knowledge instances derived from labeled examples and official definitions. The first layer identifies situational contexts (e.g., "Communication with C2 using encoded subdomains"), while the second layer captures distinctive features that differentiate similar techniques (e.g., distinguishing T1132 "Data Encoding" from T1071 "Application Layer Protocol" based on whether the focus is on encoding methods or protocol usage). This structured approach provides a transparent basis for explainable TTP assignments and enhanced human oversight, while also helping to standardize other TTP extraction systems. Experiments show our framework (using Qwen2.5-32B) boosts Technique F1 scores by 11\% over GPT-4o. Qualitative analysis confirms superior standardization, enhanced transparency, and improved explainability in real-world threat intelligence scenarios. To the best of our knowledge, this is the first work that uses the LLM to generate, update, and apply the a new knowledge for TTP extraction.
Related papers
- From Retrieval to Reasoning: A Framework for Cyber Threat Intelligence NER with Explicit and Adaptive Instructions [15.710492251334792]
TTPrompt is a framework shifting from implicit induction to explicit instruction.<n> FIR enables LLMs to self-refine guidelines by learning from errors on minimal labeled data.<n>With refinement on just 1% of training data, TTPrompt rivals models fine-tuned on the full dataset.
arXiv Detail & Related papers (2025-12-22T14:13:01Z) - ExDoS: Expert-Guided Dual-Focus Cross-Modal Distillation for Smart Contract Vulnerability Detection [8.236011772496767]
We propose ExDoS to transfer semantic knowledge from source code to bytecode.<n>To address obscured local signals in graph-level contract embeddings, we propose a Dual-Attention Graph Network.<n> Experiments on real-world contracts demonstrate that our method achieves consistent F1-score improvements.
arXiv Detail & Related papers (2025-09-12T13:56:56Z) - Rethinking Testing for LLM Applications: Characteristics, Challenges, and a Lightweight Interaction Protocol [83.83217247686402]
Large Language Models (LLMs) have evolved from simple text generators into complex software systems that integrate retrieval augmentation, tool invocation, and multi-turn interactions.<n>Their inherent non-determinism, dynamism, and context dependence pose fundamental challenges for quality assurance.<n>This paper decomposes LLM applications into a three-layer architecture: textbftextitSystem Shell Layer, textbftextitPrompt Orchestration Layer, and textbftextitLLM Inference Core.
arXiv Detail & Related papers (2025-08-28T13:00:28Z) - Text-Driven Causal Representation Learning for Source-Free Domain Generalization [82.75041792888274]
We propose TDCRL, the first method to integrate causal inference into the source-free domain generalization setting.<n>Our approach offers a clear and effective mechanism to achieve robust, domain-invariant features, ensuring robust generalization.
arXiv Detail & Related papers (2025-07-14T06:20:42Z) - Prompt-OT: An Optimal Transport Regularization Paradigm for Knowledge Preservation in Vision-Language Model Adaptation [5.296260279593993]
Vision-language models (VLMs) such as CLIP demonstrate strong performance but struggle when adapted to downstream tasks.<n>We propose an optimal transport (OT)-guided prompt learning framework that mitigates forgetting by preserving the structural consistency of feature distributions.<n>Our approach enforces joint constraints on both vision and text representations, ensuring a holistic feature alignment.
arXiv Detail & Related papers (2025-03-11T21:38:34Z) - Towards Copyright Protection for Knowledge Bases of Retrieval-augmented Language Models via Ownership Verification with Reasoning [58.57194301645823]
Large language models (LLMs) are increasingly integrated into real-world applications through retrieval-augmented generation (RAG) mechanisms.<n>Existing methods that can be generalized as watermarking techniques to protect these knowledge bases typically involve poisoning attacks.<n>We propose name for harmless' copyright protection of knowledge bases.
arXiv Detail & Related papers (2025-02-10T09:15:56Z) - A Progressive Transformer for Unifying Binary Code Embedding and Knowledge Transfer [15.689556592544667]
We introduce ProTST, a novel transformer-based methodology for binary code embedding.<n>ProTST employs a hierarchical training process based on a unique tree-like structure.<n>Results show that ProTST yields an average validation score (F1, MRR, and Recall@1) improvement of 14.8% compared to traditional two-stage training.
arXiv Detail & Related papers (2024-12-15T13:04:29Z) - Enhancing Visual Continual Learning with Language-Guided Supervision [76.38481740848434]
Continual learning aims to empower models to learn new tasks without forgetting previously acquired knowledge.
We argue that the scarce semantic information conveyed by the one-hot labels hampers the effective knowledge transfer across tasks.
Specifically, we use PLMs to generate semantic targets for each class, which are frozen and serve as supervision signals.
arXiv Detail & Related papers (2024-03-24T12:41:58Z) - Noise Contrastive Estimation-based Matching Framework for Low-Resource
Security Attack Pattern Recognition [49.536368818512116]
Tactics, Techniques and Procedures (TTPs) represent sophisticated attack patterns in the cybersecurity domain.
We formulate the problem in a different learning paradigm, where the assignment of a text to a TTP label is decided by the direct semantic similarity between the two.
We propose a neural matching architecture with an effective sampling-based learn-to-compare mechanism.
arXiv Detail & Related papers (2024-01-18T19:02:00Z) - Supporting Vision-Language Model Inference with Confounder-pruning Knowledge Prompt [71.77504700496004]
Vision-language models are pre-trained by aligning image-text pairs in a common space to deal with open-set visual concepts.
To boost the transferability of the pre-trained models, recent works adopt fixed or learnable prompts.
However, how and what prompts can improve inference performance remains unclear.
arXiv Detail & Related papers (2022-05-23T07:51:15Z) - Transductive Learning for Unsupervised Text Style Transfer [60.65782243927698]
Unsupervised style transfer models are mainly based on an inductive learning approach.
We propose a novel transductive learning approach based on a retrieval-based context-aware style representation.
arXiv Detail & Related papers (2021-09-16T08:57:20Z) - Reinforcement Learning-powered Semantic Communication via Semantic
Similarity [13.569045590522316]
We introduce a new semantic communication mechanism, whose key idea is to preserve the semantic information instead of strictly securing the bit-level precision.
We show that the commonly used bit-level metrics are vulnerable of catching important semantic meaning and structures.
We put forward a reinforcement learning (RL)-based solution which allows us to simultaneously optimize any user-defined semantic measurement.
arXiv Detail & Related papers (2021-08-27T05:21:05Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z) - Knowledge-Aware Procedural Text Understanding with Multi-Stage Training [110.93934567725826]
We focus on the task of procedural text understanding, which aims to comprehend such documents and track entities' states and locations during a process.
Two challenges, the difficulty of commonsense reasoning and data insufficiency, still remain unsolved.
We propose a novel KnOwledge-Aware proceduraL text understAnding (KOALA) model, which effectively leverages multiple forms of external knowledge.
arXiv Detail & Related papers (2020-09-28T10:28:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.