Generating Cyber Threat Intelligence to Discover Potential Security
Threats Using Classification and Topic Modeling
- URL: http://arxiv.org/abs/2108.06862v1
- Date: Mon, 16 Aug 2021 02:30:29 GMT
- Title: Generating Cyber Threat Intelligence to Discover Potential Security
Threats Using Classification and Topic Modeling
- Authors: Md Imran Hossen, Ashraful Islam, Farzana Anowar, Eshtiak Ahmed,
Mohammad Masudur Rahman
- Abstract summary: Cyber Threat Intelligence (CTI) has been represented as one of the proactive and robust mechanisms.
Our goal is to identify and explore relevant CTI from hacker forums by using different supervised and unsupervised learning techniques.
- Score: 6.0897744845912865
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Due to the variety of cyber-attacks or threats, the cybersecurity community
has been enhancing the traditional security control mechanisms to an advanced
level so that automated tools can encounter potential security threats. Very
recently a term, Cyber Threat Intelligence (CTI) has been represented as one of
the proactive and robust mechanisms because of its automated cybersecurity
threat prediction based on data. In general, CTI collects and analyses data
from various sources e.g. online security forums, social media where cyber
enthusiasts, analysts, even cybercriminals discuss cyber or computer security
related topics and discovers potential threats based on the analysis. As the
manual analysis of every such discussion i.e. posts on online platforms is
time-consuming, inefficient, and susceptible to errors, CTI as an automated
tool can perform uniquely to detect cyber threats. In this paper, our goal is
to identify and explore relevant CTI from hacker forums by using different
supervised and unsupervised learning techniques. To this end, we collect data
from a real hacker forum and constructed two datasets: a binary dataset and a
multi-class dataset. Our binary dataset contains two classes one containing
cybersecurity-relevant posts and another one containing posts that are not
related to security. This dataset is constructed using simple keyword search
technique. Using a similar approach, we further categorize posts from
security-relevant posts into five different threat categories. We then applied
several machine learning classifiers along with deep neural network-based
classifiers and use them on the datasets to compare their performances. We also
tested the classifiers on a leaked dataset with labels named nulled.io as our
ground truth. We further explore the datasets using unsupervised techniques
i.e. Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization
(NMF).
Related papers
- Identification of Malicious Posts on the Dark Web Using Supervised Machine Learning [0.0]
This study applies text mining techniques and machine learning to data collected from Dark Web forums in Brazilian Portuguese to identify malicious posts.<n>To our knowledge, this is the first study to focus specifically on Brazilian Portuguese content in this domain.<n>The best-performing model, using LightGBM and TF-IDF, was able to detect relevant posts with high accuracy.
arXiv Detail & Related papers (2025-11-28T13:51:18Z) - An Unsupervised Learning Approach For A Reliable Profiling Of Cyber Threat Actors Reported Globally Based On Complete Contextual Information Of Cyber Attacks [0.0]
It is critical to promptly recognize cyberattacks and establish strong defense mechanisms against them.<n>Creating a profile of cyber threat actors based on their traits or patterns of behavior can help to create effective defenses against cyberattacks in advance.<n>In this paper, an unsupervised efficient agglomerative hierarchal clustering technique is proposed for profiling cybercriminal groups.
arXiv Detail & Related papers (2025-09-15T08:32:59Z) - Cyber Threat Hunting: Non-Parametric Mining of Attack Patterns from Cyber Threat Intelligence for Precise Threats Attribution [0.0]
We propose a machine learning based approach featuring visually interactive analytics tool named the Cyber-Attack Pattern Explorer (CAPE)<n>In the proposed system, a non-parametric mining technique is proposed to create a dataset for identifying the attack patterns within cyber threat intelligence documents.<n>The extracted dataset is used for training of proposed machine learning algorithms that enables the attribution of cyber threats with respective to the actors.
arXiv Detail & Related papers (2025-09-15T06:15:22Z) - The Application of Transformer-Based Models for Predicting Consequences of Cyber Attacks [0.4604003661048266]
Threat Modeling can provide critical support to cybersecurity professionals, enabling them to take timely action and allocate resources that could be used elsewhere.<n>Recently, there has been a pressing need for automated methods to assess attack descriptions and forecast the future consequences of cyberattacks.<n>This study examines how Natural Language Processing (NLP) and deep learning can be applied to analyze the potential impact of cyberattacks.
arXiv Detail & Related papers (2025-08-18T15:46:36Z) - False Alarms, Real Damage: Adversarial Attacks Using LLM-based Models on Text-based Cyber Threat Intelligence Systems [1.4932549821542682]
Cyber Threat Intelligence (CTI) has emerged as a vital complementary approach that operates in the early phases of the cyber threat lifecycle.<n>Due to the large volume of data, automation through Machine Learning (ML) and Natural Language Processing (NLP) models is essential for effective CTI extraction.<n>This study investigates vulnerabilities within various components of the entire CTI pipeline and their susceptibility to adversarial attacks.
arXiv Detail & Related papers (2025-07-05T19:00:27Z) - Countering Autonomous Cyber Threats [40.00865970939829]
Foundation Models present dual-use concerns broadly and within the cyber domain specifically.
Recent research has shown the potential for these advanced models to inform or independently execute offensive cyberspace operations.
This work evaluates several state-of-the-art FMs on their ability to compromise machines in an isolated network and investigates defensive mechanisms to defeat such AI-powered attacks.
arXiv Detail & Related papers (2024-10-23T22:46:44Z) - NLP-Based Techniques for Cyber Threat Intelligence [13.958337678497163]
Survey paper provides a comprehensive overview of NLP-based techniques applied in the context of threat intelligence.
It begins by describing the foundational definitions and principles of CTI as a major tool for safeguarding digital assets.
It then undertakes a thorough examination of NLP-based techniques for CTI data crawling from Web sources, CTI data analysis, Relation Extraction from cybersecurity data, CTI sharing and collaboration, and security threats of CTI.
arXiv Detail & Related papers (2023-11-15T09:23:33Z) - Graph Mining for Cybersecurity: A Survey [61.505995908021525]
The explosive growth of cyber attacks nowadays, such as malware, spam, and intrusions, caused severe consequences on society.
Traditional Machine Learning (ML) based methods are extensively used in detecting cyber threats, but they hardly model the correlations between real-world cyber entities.
With the proliferation of graph mining techniques, many researchers investigated these techniques for capturing correlations between cyber entities and achieving high performance.
arXiv Detail & Related papers (2023-04-02T08:43:03Z) - Exploring the Limits of Transfer Learning with Unified Model in the
Cybersecurity Domain [17.225973170682604]
We introduce a generative multi-task model, Unified Text-to-Text Cybersecurity (UTS)
UTS is trained on malware reports, phishing site URLs, programming code constructs, social media data, blogs, news articles, and public forum posts.
We show UTS improves the performance of some cybersecurity datasets.
arXiv Detail & Related papers (2023-02-20T22:21:26Z) - Recognizing and Extracting Cybersecurtity-relevant Entities from Text [1.7499351967216343]
Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks.
CTI is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG)
arXiv Detail & Related papers (2022-08-02T18:44:06Z) - Towards Automated Classification of Attackers' TTPs by combining NLP
with ML Techniques [77.34726150561087]
We evaluate and compare different Natural Language Processing (NLP) and machine learning techniques used for security information extraction in research.
Based on our investigations we propose a data processing pipeline that automatically classifies unstructured text according to attackers' tactics and techniques.
arXiv Detail & Related papers (2022-07-18T09:59:21Z) - Generating Fake Cyber Threat Intelligence Using Transformer-Based Models [2.9328913897054583]
We show that a public language model like GPT-2 can generate plausible CTI text with the ability of corrupting cyber-defense systems.
We utilize the generated fake CTI text to perform a data poisoning attack on a Cybersecurity Knowledge Graph (CKG) and a cybersecurity corpus.
arXiv Detail & Related papers (2021-02-08T16:54:35Z) - A System for Efficiently Hunting for Cyber Threats in Computer Systems
Using Threat Intelligence [78.23170229258162]
We build ThreatRaptor, a system that facilitates cyber threat hunting in computer systems using OSCTI.
ThreatRaptor provides (1) an unsupervised, light-weight, and accurate NLP pipeline that extracts structured threat behaviors from unstructured OSCTI text, (2) a concise and expressive domain-specific query language, TBQL, to hunt for malicious system activities, and (3) a query synthesis mechanism that automatically synthesizes a TBQL query from the extracted threat behaviors.
arXiv Detail & Related papers (2021-01-17T19:44:09Z) - Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks,
and Defenses [150.64470864162556]
This work systematically categorizes and discusses a wide range of dataset vulnerabilities and exploits.
In addition to describing various poisoning and backdoor threat models and the relationships among them, we develop their unified taxonomy.
arXiv Detail & Related papers (2020-12-18T22:38:47Z) - Enabling Efficient Cyber Threat Hunting With Cyber Threat Intelligence [94.94833077653998]
ThreatRaptor is a system that facilitates threat hunting in computer systems using open-source Cyber Threat Intelligence (OSCTI)
It extracts structured threat behaviors from unstructured OSCTI text and uses a concise and expressive domain-specific query language, TBQL, to hunt for malicious system activities.
Evaluations on a broad set of attack cases demonstrate the accuracy and efficiency of ThreatRaptor in practical threat hunting.
arXiv Detail & Related papers (2020-10-26T14:54:01Z) - Adversarial Machine Learning Attacks and Defense Methods in the Cyber
Security Domain [58.30296637276011]
This paper summarizes the latest research on adversarial attacks against security solutions based on machine learning techniques.
It is the first to discuss the unique challenges of implementing end-to-end adversarial attacks in the cyber security domain.
arXiv Detail & Related papers (2020-07-05T18:22:40Z) - Automated Retrieval of ATT&CK Tactics and Techniques for Cyber Threat
Reports [5.789368942487406]
We evaluate several classification approaches to automatically retrieve Tactics, Techniques and Procedures from unstructured text.
We present rcATT, a tool built on top of our findings and freely distributed to the security community to support cyber threat report automated analysis.
arXiv Detail & Related papers (2020-04-29T16:45:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.