Generating Cyber Threat Intelligence to Discover Potential Security
Threats Using Classification and Topic Modeling
- URL: http://arxiv.org/abs/2108.06862v1
- Date: Mon, 16 Aug 2021 02:30:29 GMT
- Title: Generating Cyber Threat Intelligence to Discover Potential Security
Threats Using Classification and Topic Modeling
- Authors: Md Imran Hossen, Ashraful Islam, Farzana Anowar, Eshtiak Ahmed,
Mohammad Masudur Rahman
- Abstract summary: Cyber Threat Intelligence (CTI) has been represented as one of the proactive and robust mechanisms.
Our goal is to identify and explore relevant CTI from hacker forums by using different supervised and unsupervised learning techniques.
- Score: 6.0897744845912865
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Due to the variety of cyber-attacks or threats, the cybersecurity community
has been enhancing the traditional security control mechanisms to an advanced
level so that automated tools can encounter potential security threats. Very
recently a term, Cyber Threat Intelligence (CTI) has been represented as one of
the proactive and robust mechanisms because of its automated cybersecurity
threat prediction based on data. In general, CTI collects and analyses data
from various sources e.g. online security forums, social media where cyber
enthusiasts, analysts, even cybercriminals discuss cyber or computer security
related topics and discovers potential threats based on the analysis. As the
manual analysis of every such discussion i.e. posts on online platforms is
time-consuming, inefficient, and susceptible to errors, CTI as an automated
tool can perform uniquely to detect cyber threats. In this paper, our goal is
to identify and explore relevant CTI from hacker forums by using different
supervised and unsupervised learning techniques. To this end, we collect data
from a real hacker forum and constructed two datasets: a binary dataset and a
multi-class dataset. Our binary dataset contains two classes one containing
cybersecurity-relevant posts and another one containing posts that are not
related to security. This dataset is constructed using simple keyword search
technique. Using a similar approach, we further categorize posts from
security-relevant posts into five different threat categories. We then applied
several machine learning classifiers along with deep neural network-based
classifiers and use them on the datasets to compare their performances. We also
tested the classifiers on a leaked dataset with labels named nulled.io as our
ground truth. We further explore the datasets using unsupervised techniques
i.e. Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization
(NMF).
Related papers
- NLP-Based Techniques for Cyber Threat Intelligence [13.958337678497163]
Survey paper provides a comprehensive overview of NLP-based techniques applied in the context of threat intelligence.
It begins by describing the foundational definitions and principles of CTI as a major tool for safeguarding digital assets.
It then undertakes a thorough examination of NLP-based techniques for CTI data crawling from Web sources, CTI data analysis, Relation Extraction from cybersecurity data, CTI sharing and collaboration, and security threats of CTI.
arXiv Detail & Related papers (2023-11-15T09:23:33Z) - Graph Mining for Cybersecurity: A Survey [61.505995908021525]
The explosive growth of cyber attacks nowadays, such as malware, spam, and intrusions, caused severe consequences on society.
Traditional Machine Learning (ML) based methods are extensively used in detecting cyber threats, but they hardly model the correlations between real-world cyber entities.
With the proliferation of graph mining techniques, many researchers investigated these techniques for capturing correlations between cyber entities and achieving high performance.
arXiv Detail & Related papers (2023-04-02T08:43:03Z) - Exploring the Limits of Transfer Learning with Unified Model in the
Cybersecurity Domain [17.225973170682604]
We introduce a generative multi-task model, Unified Text-to-Text Cybersecurity (UTS)
UTS is trained on malware reports, phishing site URLs, programming code constructs, social media data, blogs, news articles, and public forum posts.
We show UTS improves the performance of some cybersecurity datasets.
arXiv Detail & Related papers (2023-02-20T22:21:26Z) - Recognizing and Extracting Cybersecurtity-relevant Entities from Text [1.7499351967216343]
Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks.
CTI is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG)
arXiv Detail & Related papers (2022-08-02T18:44:06Z) - Towards Automated Classification of Attackers' TTPs by combining NLP
with ML Techniques [77.34726150561087]
We evaluate and compare different Natural Language Processing (NLP) and machine learning techniques used for security information extraction in research.
Based on our investigations we propose a data processing pipeline that automatically classifies unstructured text according to attackers' tactics and techniques.
arXiv Detail & Related papers (2022-07-18T09:59:21Z) - Generating Fake Cyber Threat Intelligence Using Transformer-Based Models [2.9328913897054583]
We show that a public language model like GPT-2 can generate plausible CTI text with the ability of corrupting cyber-defense systems.
We utilize the generated fake CTI text to perform a data poisoning attack on a Cybersecurity Knowledge Graph (CKG) and a cybersecurity corpus.
arXiv Detail & Related papers (2021-02-08T16:54:35Z) - A System for Efficiently Hunting for Cyber Threats in Computer Systems
Using Threat Intelligence [78.23170229258162]
We build ThreatRaptor, a system that facilitates cyber threat hunting in computer systems using OSCTI.
ThreatRaptor provides (1) an unsupervised, light-weight, and accurate NLP pipeline that extracts structured threat behaviors from unstructured OSCTI text, (2) a concise and expressive domain-specific query language, TBQL, to hunt for malicious system activities, and (3) a query synthesis mechanism that automatically synthesizes a TBQL query from the extracted threat behaviors.
arXiv Detail & Related papers (2021-01-17T19:44:09Z) - Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks,
and Defenses [150.64470864162556]
This work systematically categorizes and discusses a wide range of dataset vulnerabilities and exploits.
In addition to describing various poisoning and backdoor threat models and the relationships among them, we develop their unified taxonomy.
arXiv Detail & Related papers (2020-12-18T22:38:47Z) - Enabling Efficient Cyber Threat Hunting With Cyber Threat Intelligence [94.94833077653998]
ThreatRaptor is a system that facilitates threat hunting in computer systems using open-source Cyber Threat Intelligence (OSCTI)
It extracts structured threat behaviors from unstructured OSCTI text and uses a concise and expressive domain-specific query language, TBQL, to hunt for malicious system activities.
Evaluations on a broad set of attack cases demonstrate the accuracy and efficiency of ThreatRaptor in practical threat hunting.
arXiv Detail & Related papers (2020-10-26T14:54:01Z) - Adversarial Machine Learning Attacks and Defense Methods in the Cyber
Security Domain [58.30296637276011]
This paper summarizes the latest research on adversarial attacks against security solutions based on machine learning techniques.
It is the first to discuss the unique challenges of implementing end-to-end adversarial attacks in the cyber security domain.
arXiv Detail & Related papers (2020-07-05T18:22:40Z) - Automated Retrieval of ATT&CK Tactics and Techniques for Cyber Threat
Reports [5.789368942487406]
We evaluate several classification approaches to automatically retrieve Tactics, Techniques and Procedures from unstructured text.
We present rcATT, a tool built on top of our findings and freely distributed to the security community to support cyber threat report automated analysis.
arXiv Detail & Related papers (2020-04-29T16:45:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.