What are the attackers doing now? Automating cyber threat intelligence
extraction from text on pace with the changing threat landscape: A survey
- URL: http://arxiv.org/abs/2109.06808v1
- Date: Tue, 14 Sep 2021 16:38:41 GMT
- Title: What are the attackers doing now? Automating cyber threat intelligence
extraction from text on pace with the changing threat landscape: A survey
- Authors: Md Rayhanur Rahman, Rezvan Mahdavi-Hezaveh, Laurie Williams
- Abstract summary: We systematically collect "CTI extraction from text"-related studies from the literature.
We identify the data sources, techniques, and CTI sharing formats utilized in the context of the proposed pipeline.
- Score: 1.1064955465386
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cybersecurity researchers have contributed to the automated extraction of CTI
from textual sources, such as threat reports and online articles, where
cyberattack strategies, procedures, and tools are described. The goal of this
article is to aid cybersecurity researchers understand the current techniques
used for cyberthreat intelligence extraction from text through a survey of
relevant studies in the literature. We systematically collect "CTI extraction
from text"-related studies from the literature and categorize the CTI
extraction purposes. We propose a CTI extraction pipeline abstracted from these
studies. We identify the data sources, techniques, and CTI sharing formats
utilized in the context of the proposed pipeline. Our work finds ten types of
extraction purposes, such as extraction indicators of compromise extraction,
TTPs (tactics, techniques, procedures of attack), and cybersecurity keywords.
We also identify seven types of textual sources for CTI extraction, and textual
data obtained from hacker forums, threat reports, social media posts, and
online news articles have been used by almost 90% of the studies. Natural
language processing along with both supervised and unsupervised machine
learning techniques such as named entity recognition, topic modelling,
dependency parsing, supervised classification, and clustering are used for CTI
extraction. We observe the technical challenges associated with these studies
related to obtaining available clean, labelled data which could assure
replication, validation, and further extension of the studies. As we find the
studies focusing on CTI information extraction from text, we advocate for
building upon the current CTI extraction work to help cybersecurity
practitioners with proactive decision making such as threat prioritization,
automated threat modelling to utilize knowledge from past cybersecurity
incidents.
Related papers
- CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity [49.657358248788945]
Textual descriptions in cyber threat intelligence (CTI) reports are rich sources of knowledge about cyber threats.
Current CTI extraction methods lack flexibility and generalizability, often resulting in inaccurate and incomplete knowledge extraction.
We propose CTINexus, a novel framework leveraging optimized in-context learning (ICL) of large language models.
arXiv Detail & Related papers (2024-10-28T14:18:32Z) - Actionable Cyber Threat Intelligence using Knowledge Graphs and Large Language Models [0.8192907805418583]
Microsoft, Trend Micro, and CrowdStrike are using generative AI to facilitate CTI extraction.
This paper addresses the challenge of automating the extraction of actionable CTI using advancements in Large Language Models (LLMs) and Knowledge Graphs (KGs)
Our methodology evaluates techniques such as prompt engineering, the guidance framework, and fine-tuning to optimize information extraction and structuring.
Experimental results demonstrate the effectiveness of our approach in extracting relevant information, with guidance and fine-tuning showing superior performance over prompt engineering.
arXiv Detail & Related papers (2024-06-30T13:02:03Z) - Combatting Human Trafficking in the Cyberspace: A Natural Language
Processing-Based Methodology to Analyze the Language in Online Advertisements [55.2480439325792]
This project tackles the pressing issue of human trafficking in online C2C marketplaces through advanced Natural Language Processing (NLP) techniques.
We introduce a novel methodology for generating pseudo-labeled datasets with minimal supervision, serving as a rich resource for training state-of-the-art NLP models.
A key contribution is the implementation of an interpretability framework using Integrated Gradients, providing explainable insights crucial for law enforcement.
arXiv Detail & Related papers (2023-11-22T02:45:01Z) - NLP-Based Techniques for Cyber Threat Intelligence [13.958337678497163]
Survey paper provides a comprehensive overview of NLP-based techniques applied in the context of threat intelligence.
It begins by describing the foundational definitions and principles of CTI as a major tool for safeguarding digital assets.
It then undertakes a thorough examination of NLP-based techniques for CTI data crawling from Web sources, CTI data analysis, Relation Extraction from cybersecurity data, CTI sharing and collaboration, and security threats of CTI.
arXiv Detail & Related papers (2023-11-15T09:23:33Z) - Graph Mining for Cybersecurity: A Survey [61.505995908021525]
The explosive growth of cyber attacks nowadays, such as malware, spam, and intrusions, caused severe consequences on society.
Traditional Machine Learning (ML) based methods are extensively used in detecting cyber threats, but they hardly model the correlations between real-world cyber entities.
With the proliferation of graph mining techniques, many researchers investigated these techniques for capturing correlations between cyber entities and achieving high performance.
arXiv Detail & Related papers (2023-04-02T08:43:03Z) - Towards Relation Extraction From Speech [56.36416922396724]
We propose a new listening information extraction task, i.e., speech relation extraction.
We construct the training dataset for speech relation extraction via text-to-speech systems, and we construct the testing dataset via crowd-sourcing with native English speakers.
We conduct comprehensive experiments to distinguish the challenges in speech relation extraction, which may shed light on future explorations.
arXiv Detail & Related papers (2022-10-17T05:53:49Z) - From Threat Reports to Continuous Threat Intelligence: A Comparison of
Attack Technique Extraction Methods from Textual Artifacts [11.396560798899412]
Threat reports contain detailed descriptions of attack Tactics, Techniques, and Procedures (TTP) written in an unstructured text format.
TTP extraction methods are proposed in the literature, but not all of these methods are compared to one another or to a baseline.
In this work, we identify ten existing TTP extraction studies from the literature and implement five methods from the ten studies.
We find two methods, based on Term Frequency-Inverse Document Frequency(TFIDF) and Latent Semantic Indexing (LSI), outperform the other three methods with a F1 score of 84% and 83%,
arXiv Detail & Related papers (2022-10-05T23:21:41Z) - Recognizing and Extracting Cybersecurtity-relevant Entities from Text [1.7499351967216343]
Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks.
CTI is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG)
arXiv Detail & Related papers (2022-08-02T18:44:06Z) - Towards Automated Classification of Attackers' TTPs by combining NLP
with ML Techniques [77.34726150561087]
We evaluate and compare different Natural Language Processing (NLP) and machine learning techniques used for security information extraction in research.
Based on our investigations we propose a data processing pipeline that automatically classifies unstructured text according to attackers' tactics and techniques.
arXiv Detail & Related papers (2022-07-18T09:59:21Z) - EXTRACTOR: Extracting Attack Behavior from Threat Reports [6.471387545969443]
We propose a novel approach and tool called provenanceOR that allows precise automatic extraction of concise attack behaviors from CTI reports.
provenanceOR makes no strong assumptions about the text and is capable of extracting attack behaviors as graphs from unstructured text.
Our evaluation results show that provenanceOR can extract concise graphs from CTI reports and can successfully be used by cyber-analytics tools in threat-hunting.
arXiv Detail & Related papers (2021-04-17T18:51:00Z) - Survey of Network Intrusion Detection Methods from the Perspective of
the Knowledge Discovery in Databases Process [63.75363908696257]
We review the methods that have been applied to network data with the purpose of developing an intrusion detector.
We discuss the techniques used for the capture, preparation and transformation of the data, as well as, the data mining and evaluation methods.
As a result of this literature review, we investigate some open issues which will need to be considered for further research in the area of network security.
arXiv Detail & Related papers (2020-01-27T11:21:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.