Towards a scalable AI-driven framework for data-independent Cyber Threat Intelligence Information Extraction
- URL: http://arxiv.org/abs/2501.06239v1
- Date: Wed, 08 Jan 2025 12:35:17 GMT
- Title: Towards a scalable AI-driven framework for data-independent Cyber Threat Intelligence Information Extraction
- Authors: Olga Sorokoletova, Emanuele Antonioni, Giordano Colò,
- Abstract summary: This paper introduces 0-CTI, a scalable AI-based framework designed for efficient CTI Information Extraction.
The proposed system processes complete text sequences of CTI reports to extract a cyber ontology of named entities and their relationships.
Our contribution is the development of 0-CTI, the first modular framework for CTI Information Extraction that supports both supervised and zero-shot learning.
- Score: 0.0
- License:
- Abstract: Cyber Threat Intelligence (CTI) is critical for mitigating threats to organizations, governments, and institutions, yet the necessary data are often dispersed across diverse formats. AI-driven solutions for CTI Information Extraction (IE) typically depend on high-quality, annotated data, which are not always available. This paper introduces 0-CTI, a scalable AI-based framework designed for efficient CTI Information Extraction. Leveraging advanced Natural Language Processing (NLP) techniques, particularly Transformer-based architectures, the proposed system processes complete text sequences of CTI reports to extract a cyber ontology of named entities and their relationships. Our contribution is the development of 0-CTI, the first modular framework for CTI Information Extraction that supports both supervised and zero-shot learning. Unlike existing state-of-the-art models that rely heavily on annotated datasets, our system enables fully dataless operation through zero-shot methods for both Entity and Relation Extraction, making it adaptable to various data availability scenarios. Additionally, our supervised Entity Extractor surpasses current state-of-the-art performance in cyber Entity Extraction, highlighting the dual strength of the framework in both low-resource and data-rich environments. By aligning the system's outputs with the Structured Threat Information Expression (STIX) format, a standard for information exchange in the cybersecurity domain, 0-CTI standardizes extracted knowledge, enhancing communication and collaboration in cybersecurity operations.
Related papers
- CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity [49.657358248788945]
Textual descriptions in cyber threat intelligence (CTI) reports are rich sources of knowledge about cyber threats.
Current CTI extraction methods lack flexibility and generalizability, often resulting in inaccurate and incomplete knowledge extraction.
We propose CTINexus, a novel framework leveraging optimized in-context learning (ICL) of large language models.
arXiv Detail & Related papers (2024-10-28T14:18:32Z) - Sustainable Diffusion-based Incentive Mechanism for Generative AI-driven Digital Twins in Industrial Cyber-Physical Systems [65.22300383287904]
Industrial Cyber-Physical Systems (ICPSs) are an integral component of modern manufacturing and industries.
By digitizing data throughout product life cycles, Digital Twins (DTs) in ICPSs enable a shift from current industrial infrastructures to intelligent and adaptive infrastructures.
GenAI can drive the construction and update of DTs to improve predictive accuracy and prepare for diverse smart manufacturing.
arXiv Detail & Related papers (2024-08-02T10:47:10Z) - Actionable Cyber Threat Intelligence using Knowledge Graphs and Large Language Models [0.8192907805418583]
Microsoft, Trend Micro, and CrowdStrike are using generative AI to facilitate CTI extraction.
This paper addresses the challenge of automating the extraction of actionable CTI using advancements in Large Language Models (LLMs) and Knowledge Graphs (KGs)
Our methodology evaluates techniques such as prompt engineering, the guidance framework, and fine-tuning to optimize information extraction and structuring.
Experimental results demonstrate the effectiveness of our approach in extracting relevant information, with guidance and fine-tuning showing superior performance over prompt engineering.
arXiv Detail & Related papers (2024-06-30T13:02:03Z) - Agent-driven Generative Semantic Communication with Cross-Modality and Prediction [57.335922373309074]
We propose a novel agent-driven generative semantic communication framework based on reinforcement learning.
In this work, we develop an agent-assisted semantic encoder with cross-modality capability, which can track the semantic changes, channel condition, to perform adaptive semantic extraction and sampling.
The effectiveness of the designed models has been verified using the UA-DETRAC dataset, demonstrating the performance gains of the overall A-GSC framework.
arXiv Detail & Related papers (2024-04-10T13:24:27Z) - AGIR: Automating Cyber Threat Intelligence Reporting with Natural
Language Generation [15.43868945929965]
We introduce AGIR (Automatic Generation of Intelligence Reports), a transformative tool for CTI reporting.
AGIR's primary objective is to empower security analysts by automating the labor-intensive task of generating comprehensive intelligence reports.
We evaluate AGIR's report generation capabilities both quantitatively and qualitatively.
arXiv Detail & Related papers (2023-10-04T08:25:37Z) - Time for aCTIon: Automated Analysis of Cyber Threat Intelligence in the
Wild [2.4669630540735215]
Cyber Threat Intelligence (CTI) plays a crucial role in assessing risks and enhancing security for organizations.
Existing tools for automated structured CTI extraction have performance limitations.
We fill these gaps providing a new large open benchmark dataset and aCTIon, a structured CTI information extraction tool.
arXiv Detail & Related papers (2023-07-14T13:43:16Z) - Causal Semantic Communication for Digital Twins: A Generalizable
Imitation Learning Approach [74.25870052841226]
A digital twin (DT) leverages a virtual representation of the physical world, along with communication (e.g., 6G), computing, and artificial intelligence (AI) technologies to enable many connected intelligence services.
Wireless systems can exploit the paradigm of semantic communication (SC) for facilitating informed decision-making under strict communication constraints.
A novel framework called causal semantic communication (CSC) is proposed for DT-based wireless systems.
arXiv Detail & Related papers (2023-04-25T00:15:00Z) - ThreatKG: An AI-Powered System for Automated Open-Source Cyber Threat Intelligence Gathering and Management [65.0114141380651]
ThreatKG is an automated system for OSCTI gathering and management.
It efficiently collects a large number of OSCTI reports from multiple sources.
It uses specialized AI-based techniques to extract high-quality knowledge about various threat entities.
arXiv Detail & Related papers (2022-12-20T16:13:59Z) - Recognizing and Extracting Cybersecurtity-relevant Entities from Text [1.7499351967216343]
Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks.
CTI is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG)
arXiv Detail & Related papers (2022-08-02T18:44:06Z) - InfoBERT: Improving Robustness of Language Models from An Information
Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks.
We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.