Related papers: Constructing Multi-label Hierarchical Classification Models for MITRE ATT&CK Text Tagging

Constructing Multi-label Hierarchical Classification Models for MITRE ATT&CK Text Tagging

URL: http://arxiv.org/abs/2601.14556v1
Date: Wed, 21 Jan 2026 00:41:34 GMT
Title: Constructing Multi-label Hierarchical Classification Models for MITRE ATT&CK Text Tagging
Authors: Andrew Crossman, Jonah Dodd, Viralam Ramamurthy Chaithanya Kumar, Riyaz Mohammed, Andrew R. Plummer, Chandra Sekharudu, Deepak Warrier, Mohammad Yekrangian,
Abstract summary: We provide a "task space" characterization of the MITRE ATT&CK text tagging task.<n>We construct our own multi-label hierarchical classification models for the text tagging task.<n>Our models meet or surpass state-of-the-art performance while relying only on classical machine learning methods.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: MITRE ATT&CK is a cybersecurity knowledge base that organizes threat actor and cyber-attack information into a set of tactics describing the reasons and goals threat actors have for carrying out attacks, with each tactic having a set of techniques that describe the potential methods used in these attacks. One major application of ATT&CK is the use of its tactic and technique hierarchy by security specialists as a framework for annotating cyber-threat intelligence reports, vulnerability descriptions, threat scenarios, inter alia, to facilitate downstream analyses. To date, the tagging process is still largely done manually. In this technical note, we provide a stratified "task space" characterization of the MITRE ATT&CK text tagging task for organizing previous efforts toward automation using AIML methods, while also clarifying pathways for constructing new methods. To illustrate one of the pathways, we use the task space strata to stage-wise construct our own multi-label hierarchical classification models for the text tagging task via experimentation over general cyber-threat intelligence text -- using shareable computational tools and publicly releasing the models to the security community (via https://github.com/jpmorganchase/MITRE_models). Our multi-label hierarchical approach yields accuracy scores of roughly 94% at the tactic level, as well as accuracy scores of roughly 82% at the technique level. The models also meet or surpass state-of-the-art performance while relying only on classical machine learning methods -- removing any dependence on LLMs, RAG, agents, or more complex hierarchical approaches. Moreover, we show that GPT-4o model performance at the tactic level is significantly lower (roughly 60% accuracy) than our own approach. We also extend our baseline model to a corpus of threat scenarios for financial applications produced by subject matter experts.

Related papers

MagicAgent: Towards Generalized Agent Planning [73.21129030631421]
We present textbfMagicAgent, a series of foundation models specifically designed for generalized agent planning.<n>We introduce a lightweight and scalable synthetic data framework that generates high-quality trajectories across diverse planning tasks.<n>We show that MagicAgent-32B and MagicAgent-30B-A3B achieve superior performance across diverse open-source benchmarks.
arXiv Detail & Related papers (2026-02-22T01:39:16Z)
Prompt Injection Attacks on Agentic Coding Assistants: A Systematic Analysis of Vulnerabilities in Skills, Tools, and Protocol Ecosystems [0.0]
We present a comprehensive analysis of prompt injection attacks targeting agentic coding assistants.<n>Our meta-analysis synthesizes findings from 78 recent studies.<n>Our findings indicate that the security community must treat prompt injection as a first-class vulnerability class.
arXiv Detail & Related papers (2026-01-24T18:39:21Z)
Context-Aware Hierarchical Learning: A Two-Step Paradigm towards Safer LLMs [38.3239023969819]
Large Language Models (LLMs) have emerged as powerful tools for diverse applications.<n>We identify and propose a novel class of vulnerabilities, termed Tool-Completion Attack (TCA)<n>We introduce Context-Aware Hierarchical Learning (CAHL) to address these vulnerabilities.
arXiv Detail & Related papers (2025-12-03T12:10:21Z)
A Systematic Survey of Model Extraction Attacks and Defenses: State-of-the-Art and Perspectives [65.3369988566853]
Recent studies have demonstrated that adversaries can replicate a target model's functionality.<n>Model Extraction Attacks pose threats to intellectual property, privacy, and system security.<n>We propose a novel taxonomy that classifies MEAs according to attack mechanisms, defense approaches, and computing environments.
arXiv Detail & Related papers (2025-08-20T19:49:59Z)
A Survey on Model Extraction Attacks and Defenses for Large Language Models [55.60375624503877]
Model extraction attacks pose significant security threats to deployed language models.<n>This survey provides a comprehensive taxonomy of extraction attacks and defenses, categorizing attacks into functionality extraction, training data extraction, and prompt-targeted attacks.<n>We examine defense mechanisms organized into model protection, data privacy protection, and prompt-targeted strategies, evaluating their effectiveness across different deployment scenarios.
arXiv Detail & Related papers (2025-06-26T22:02:01Z)
ATAG: AI-Agent Application Threat Assessment with Attack Graphs [23.757154032523093]
This paper introduces AI-agent application Threat assessment with Attack Graphs (ATAG)<n>ATAG is a novel framework designed to systematically analyze the security risks associated with AI-agent applications.<n>It facilitates proactive identification and mitigation of AI-agent threats in multi-agent applications.
arXiv Detail & Related papers (2025-06-03T13:25:40Z)
Adversarial Training for Defense Against Label Poisoning Attacks [53.893792844055106]
Label poisoning attacks pose significant risks to machine learning models.<n>We propose a novel adversarial training defense strategy based on support vector machines (SVMs) to counter these threats.<n>Our approach accommodates various model architectures and employs a projected gradient descent algorithm with kernel SVMs for adversarial training.
arXiv Detail & Related papers (2025-02-24T13:03:19Z)
Cyber-Attack Technique Classification Using Two-Stage Trained Large Language Models [5.713349305091325]
We present a sentence classification system that can identify the attack techniques described in natural language sentences from cyber threat intelligence (CTI) reports.<n>We propose a new method for utilizing auxiliary data with the same labels to improve classification for the low-resource cyberattack classification task.
arXiv Detail & Related papers (2024-11-27T21:09:02Z)
CTINexus: Automatic Cyber Threat Intelligence Knowledge Graph Construction Using Large Language Models [49.657358248788945]
Textual descriptions in cyber threat intelligence (CTI) reports are rich sources of knowledge about cyber threats.<n>Current CTI knowledge extraction methods lack flexibility and generalizability.<n>We propose CTINexus, a novel framework for data-efficient CTI knowledge extraction and high-quality cybersecurity knowledge graph (CSKG) construction.
arXiv Detail & Related papers (2024-10-28T14:18:32Z)
Noise Contrastive Estimation-based Matching Framework for Low-Resource Security Attack Pattern Recognition [45.34519578504934]
Tactics, Techniques and Procedures (TTPs) represent sophisticated attack patterns in the cybersecurity domain.<n>We formulate the problem in a different learning paradigm, where the assignment of a text to a TTP label is decided by the direct semantic similarity between the two.<n>We propose a neural matching architecture with an effective sampling-based learn-to-compare mechanism.
arXiv Detail & Related papers (2024-01-18T19:02:00Z)
Towards Automated Classification of Attackers' TTPs by combining NLP with ML Techniques [77.34726150561087]
We evaluate and compare different Natural Language Processing (NLP) and machine learning techniques used for security information extraction in research. Based on our investigations we propose a data processing pipeline that automatically classifies unstructured text according to attackers' tactics and techniques.
arXiv Detail & Related papers (2022-07-18T09:59:21Z)
Automated Retrieval of ATT&CK Tactics and Techniques for Cyber Threat Reports [5.789368942487406]
We evaluate several classification approaches to automatically retrieve Tactics, Techniques and Procedures from unstructured text. We present rcATT, a tool built on top of our findings and freely distributed to the security community to support cyber threat report automated analysis.
arXiv Detail & Related papers (2020-04-29T16:45:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.