From Threat Reports to Continuous Threat Intelligence: A Comparison of
Attack Technique Extraction Methods from Textual Artifacts
- URL: http://arxiv.org/abs/2210.02601v1
- Date: Wed, 5 Oct 2022 23:21:41 GMT
- Title: From Threat Reports to Continuous Threat Intelligence: A Comparison of
Attack Technique Extraction Methods from Textual Artifacts
- Authors: Md Rayhanur Rahman, Laurie Williams
- Abstract summary: Threat reports contain detailed descriptions of attack Tactics, Techniques, and Procedures (TTP) written in an unstructured text format.
TTP extraction methods are proposed in the literature, but not all of these methods are compared to one another or to a baseline.
In this work, we identify ten existing TTP extraction studies from the literature and implement five methods from the ten studies.
We find two methods, based on Term Frequency-Inverse Document Frequency(TFIDF) and Latent Semantic Indexing (LSI), outperform the other three methods with a F1 score of 84% and 83%,
- Score: 11.396560798899412
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The cyberthreat landscape is continuously evolving. Hence, continuous
monitoring and sharing of threat intelligence have become a priority for
organizations. Threat reports, published by cybersecurity vendors, contain
detailed descriptions of attack Tactics, Techniques, and Procedures (TTP)
written in an unstructured text format. Extracting TTP from these reports aids
cybersecurity practitioners and researchers learn and adapt to evolving attacks
and in planning threat mitigation. Researchers have proposed TTP extraction
methods in the literature, however, not all of these proposed methods are
compared to one another or to a baseline. \textit{The goal of this study is to
aid cybersecurity researchers and practitioners choose attack technique
extraction methods for monitoring and sharing threat intelligence by comparing
the underlying methods from the TTP extraction studies in the literature.} In
this work, we identify ten existing TTP extraction studies from the literature
and implement five methods from the ten studies. We find two methods, based on
Term Frequency-Inverse Document Frequency(TFIDF) and Latent Semantic Indexing
(LSI), outperform the other three methods with a F1 score of 84\% and 83\%,
respectively. We observe the performance of all methods in F1 score drops in
the case of increasing the class labels exponentially. We also implement and
evaluate an oversampling strategy to mitigate class imbalance issues.
Furthermore, oversampling improves the classification performance of TTP
extraction. We provide recommendations from our findings for future
cybersecurity researchers, such as the construction of a benchmark dataset from
a large corpus; and the selection of textual features of TTP. Our work, along
with the dataset and implementation source code, can work as a baseline for
cybersecurity researchers to test and compare the performance of future TTP
extraction methods.
Related papers
- CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity [49.657358248788945]
Textual descriptions in cyber threat intelligence (CTI) reports are rich sources of knowledge about cyber threats.
Current CTI extraction methods lack flexibility and generalizability, often resulting in inaccurate and incomplete knowledge extraction.
We propose CTINexus, a novel framework leveraging optimized in-context learning (ICL) of large language models.
arXiv Detail & Related papers (2024-10-28T14:18:32Z) - A Comparison of Vulnerability Feature Extraction Methods from Textual Attack Patterns [0.22940141855172028]
This paper aims to aid cybersecurity researchers and practitioners in choosing attack extraction methods.
Term Frequency-Inverse Document Frequency (TF-IDF) outperforms the other four methods with a precision of 75% and an F1 score of 64%.
arXiv Detail & Related papers (2024-07-09T11:04:49Z) - Investigating Persuasion Techniques in Arabic: An Empirical Study Leveraging Large Language Models [0.13980986259786224]
This paper presents a comprehensive empirical study focused on identifying persuasive techniques in Arabic social media content.
We utilize Pre-trained Language Models (PLMs) and leverage the ArAlEval dataset.
Our study explores three different learning approaches by harnessing the power of PLMs.
arXiv Detail & Related papers (2024-05-21T15:55:09Z) - TTPXHunter: Actionable Threat Intelligence Extraction as TTPs from Finished Cyber Threat Reports [3.2183320563774833]
Knowing the modus operandi of adversaries aids organizations in employing efficient defensive strategies and sharing intelligence in the community.
A translation tool is needed to interpret the modus operandi explained in the sentences of the threat report and translate it into a structured format.
This research introduces a methodology named TTPXHunter for the automated extraction of threat intelligence in terms of Tactics, Techniques, and Procedures (TTPs) from finished cyber threat reports.
arXiv Detail & Related papers (2024-03-05T19:04:09Z) - Text generation for dataset augmentation in security classification
tasks [55.70844429868403]
This study evaluates the application of natural language text generators to fill this data gap in multiple security-related text classification tasks.
We find substantial benefits for GPT-3 data augmentation strategies in situations with severe limitations on known positive-class samples.
arXiv Detail & Related papers (2023-10-22T22:25:14Z) - Towards Automated Classification of Attackers' TTPs by combining NLP
with ML Techniques [77.34726150561087]
We evaluate and compare different Natural Language Processing (NLP) and machine learning techniques used for security information extraction in research.
Based on our investigations we propose a data processing pipeline that automatically classifies unstructured text according to attackers' tactics and techniques.
arXiv Detail & Related papers (2022-07-18T09:59:21Z) - Adversarial Attacks and Defense for Non-Parametric Two-Sample Tests [73.32304304788838]
This paper systematically uncovers the failure mode of non-parametric TSTs through adversarial attacks.
To enable TST-agnostic attacks, we propose an ensemble attack framework that jointly minimizes the different types of test criteria.
To robustify TSTs, we propose a max-min optimization that iteratively generates adversarial pairs to train the deep kernels.
arXiv Detail & Related papers (2022-02-07T11:18:04Z) - SAIS: Supervising and Augmenting Intermediate Steps for Document-Level
Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction.
Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z) - What are the attackers doing now? Automating cyber threat intelligence
extraction from text on pace with the changing threat landscape: A survey [1.1064955465386]
We systematically collect "CTI extraction from text"-related studies from the literature.
We identify the data sources, techniques, and CTI sharing formats utilized in the context of the proposed pipeline.
arXiv Detail & Related papers (2021-09-14T16:38:41Z) - Deep Learning Schema-based Event Extraction: Literature Review and
Current Trends [60.29289298349322]
Event extraction technology based on deep learning has become a research hotspot.
This paper fills the gap by reviewing the state-of-the-art approaches, focusing on deep learning-based models.
arXiv Detail & Related papers (2021-07-05T16:32:45Z) - Automated Retrieval of ATT&CK Tactics and Techniques for Cyber Threat
Reports [5.789368942487406]
We evaluate several classification approaches to automatically retrieve Tactics, Techniques and Procedures from unstructured text.
We present rcATT, a tool built on top of our findings and freely distributed to the security community to support cyber threat report automated analysis.
arXiv Detail & Related papers (2020-04-29T16:45:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.