Related papers: Recognizing and Extracting Cybersecurtity-relevant Entities from Text

Recognizing and Extracting Cybersecurtity-relevant Entities from Text

URL: http://arxiv.org/abs/2208.01693v1
Date: Tue, 2 Aug 2022 18:44:06 GMT
Title: Recognizing and Extracting Cybersecurtity-relevant Entities from Text
Authors: Casey Hanks, Michael Maiden, Priyanka Ranade, Tim Finin, Anupam Joshi
Abstract summary: Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks. CTI is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG)
Score: 1.7499351967216343
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks and is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG). There is a strong need to develop community-accessible datasets to train existing AI-based cybersecurity pipelines to efficiently and accurately extract meaningful insights from CTI. We have created an initial unstructured CTI corpus from a variety of open sources that we are using to train and test cybersecurity entity models using the spaCy framework and exploring self-learning methods to automatically recognize cybersecurity entities. We also describe methods to apply cybersecurity domain entity linking with existing world knowledge from Wikidata. Our future work will survey and test spaCy NLP tools and create methods for continuous integration of new information extracted from text.

Related papers

Identification of Malicious Posts on the Dark Web Using Supervised Machine Learning [0.0]
This study applies text mining techniques and machine learning to data collected from Dark Web forums in Brazilian Portuguese to identify malicious posts.<n>To our knowledge, this is the first study to focus specifically on Brazilian Portuguese content in this domain.<n>The best-performing model, using LightGBM and TF-IDF, was able to detect relevant posts with high accuracy.
arXiv Detail & Related papers (2025-11-28T13:51:18Z)
False Alarms, Real Damage: Adversarial Attacks Using LLM-based Models on Text-based Cyber Threat Intelligence Systems [1.4932549821542682]
Cyber Threat Intelligence (CTI) has emerged as a vital complementary approach that operates in the early phases of the cyber threat lifecycle.<n>Due to the large volume of data, automation through Machine Learning (ML) and Natural Language Processing (NLP) models is essential for effective CTI extraction.<n>This study investigates vulnerabilities within various components of the entire CTI pipeline and their susceptibility to adversarial attacks.
arXiv Detail & Related papers (2025-07-05T19:00:27Z)
Llama-3.1-FoundationAI-SecurityLLM-Base-8B Technical Report [50.268821168513654]
We present Foundation-Sec-8B, a cybersecurity-focused large language model (LLMs) built on the Llama 3.1 architecture. We evaluate it across both established and new cybersecurity benchmarks, showing that it matches Llama 3.1-70B and GPT-4o-mini in certain cybersecurity-specific tasks. By releasing our model to the public, we aim to accelerate progress and adoption of AI-driven tools in both public and private cybersecurity contexts.
arXiv Detail & Related papers (2025-04-28T08:41:12Z)
CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity [49.657358248788945]
Textual descriptions in cyber threat intelligence (CTI) reports are rich sources of knowledge about cyber threats. Current CTI extraction methods lack flexibility and generalizability, often resulting in inaccurate and incomplete knowledge extraction. We propose CTINexus, a novel framework leveraging optimized in-context learning (ICL) of large language models.
arXiv Detail & Related papers (2024-10-28T14:18:32Z)
A dataset for cyber threat intelligence modeling of connected autonomous vehicles [17.58243748365034]
This paper reports the creation of a cyber threat intelligence corpus focusing on vehicle cybersecurity knowledge mining. The proposed dataset will serve as a valuable resource for evaluating the performance of existing algorithms and advancing research in cyber threat intelligence modeling within the automotive field.
arXiv Detail & Related papers (2024-10-18T16:55:12Z)
KGV: Integrating Large Language Models with Knowledge Graphs for Cyber Threat Intelligence Credibility Assessment [38.312774244521]
We propose a knowledge graph-based verifier for Cyber Threat Intelligence (CTI) quality assessment framework. Our approach introduces Large Language Models (LLMs) to automatically extract OSCTI key claims to be verified. To fill the gap in the research field, we created and made public the first dataset for threat intelligence assessment from heterogeneous sources.
arXiv Detail & Related papers (2024-08-15T11:32:46Z)
Actionable Cyber Threat Intelligence using Knowledge Graphs and Large Language Models [0.8192907805418583]
Microsoft, Trend Micro, and CrowdStrike are using generative AI to facilitate CTI extraction. This paper addresses the challenge of automating the extraction of actionable CTI using advancements in Large Language Models (LLMs) and Knowledge Graphs (KGs) Our methodology evaluates techniques such as prompt engineering, the guidance framework, and fine-tuning to optimize information extraction and structuring. Experimental results demonstrate the effectiveness of our approach in extracting relevant information, with guidance and fine-tuning showing superior performance over prompt engineering.
arXiv Detail & Related papers (2024-06-30T13:02:03Z)
NLP-Based Techniques for Cyber Threat Intelligence [13.958337678497163]
Survey paper provides a comprehensive overview of NLP-based techniques applied in the context of threat intelligence. It begins by describing the foundational definitions and principles of CTI as a major tool for safeguarding digital assets. It then undertakes a thorough examination of NLP-based techniques for CTI data crawling from Web sources, CTI data analysis, Relation Extraction from cybersecurity data, CTI sharing and collaboration, and security threats of CTI.
arXiv Detail & Related papers (2023-11-15T09:23:33Z)
Graph Mining for Cybersecurity: A Survey [61.505995908021525]
The explosive growth of cyber attacks nowadays, such as malware, spam, and intrusions, caused severe consequences on society. Traditional Machine Learning (ML) based methods are extensively used in detecting cyber threats, but they hardly model the correlations between real-world cyber entities. With the proliferation of graph mining techniques, many researchers investigated these techniques for capturing correlations between cyber entities and achieving high performance.
arXiv Detail & Related papers (2023-04-02T08:43:03Z)
ThreatKG: An AI-Powered System for Automated Open-Source Cyber Threat Intelligence Gathering and Management [65.0114141380651]
ThreatKG is an automated system for OSCTI gathering and management. It efficiently collects a large number of OSCTI reports from multiple sources. It uses specialized AI-based techniques to extract high-quality knowledge about various threat entities.
arXiv Detail & Related papers (2022-12-20T16:13:59Z)
Towards Automated Classification of Attackers' TTPs by combining NLP with ML Techniques [77.34726150561087]
We evaluate and compare different Natural Language Processing (NLP) and machine learning techniques used for security information extraction in research. Based on our investigations we propose a data processing pipeline that automatically classifies unstructured text according to attackers' tactics and techniques.
arXiv Detail & Related papers (2022-07-18T09:59:21Z)
Generating Cyber Threat Intelligence to Discover Potential Security Threats Using Classification and Topic Modeling [6.0897744845912865]
Cyber Threat Intelligence (CTI) has been represented as one of the proactive and robust mechanisms. Our goal is to identify and explore relevant CTI from hacker forums by using different supervised and unsupervised learning techniques.
arXiv Detail & Related papers (2021-08-16T02:30:29Z)
A System for Automated Open-Source Threat Intelligence Gathering and Management [53.65687495231605]
SecurityKG is a system for automated OSCTI gathering and management. It uses a combination of AI and NLP techniques to extract high-fidelity knowledge about threat behaviors.
arXiv Detail & Related papers (2021-01-19T18:31:35Z)
A System for Efficiently Hunting for Cyber Threats in Computer Systems Using Threat Intelligence [78.23170229258162]
We build ThreatRaptor, a system that facilitates cyber threat hunting in computer systems using OSCTI. ThreatRaptor provides (1) an unsupervised, light-weight, and accurate NLP pipeline that extracts structured threat behaviors from unstructured OSCTI text, (2) a concise and expressive domain-specific query language, TBQL, to hunt for malicious system activities, and (3) a query synthesis mechanism that automatically synthesizes a TBQL query from the extracted threat behaviors.
arXiv Detail & Related papers (2021-01-17T19:44:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.