AGIR: Automating Cyber Threat Intelligence Reporting with Natural
Language Generation
- URL: http://arxiv.org/abs/2310.02655v1
- Date: Wed, 4 Oct 2023 08:25:37 GMT
- Title: AGIR: Automating Cyber Threat Intelligence Reporting with Natural
Language Generation
- Authors: Filippo Perrina, Francesco Marchiori, Mauro Conti, Nino Vincenzo Verde
- Abstract summary: We introduce AGIR (Automatic Generation of Intelligence Reports), a transformative tool for CTI reporting.
AGIR's primary objective is to empower security analysts by automating the labor-intensive task of generating comprehensive intelligence reports.
We evaluate AGIR's report generation capabilities both quantitatively and qualitatively.
- Score: 15.43868945929965
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Cyber Threat Intelligence (CTI) reporting is pivotal in contemporary risk
management strategies. As the volume of CTI reports continues to surge, the
demand for automated tools to streamline report generation becomes increasingly
apparent. While Natural Language Processing techniques have shown potential in
handling text data, they often struggle to address the complexity of diverse
data sources and their intricate interrelationships. Moreover, established
paradigms like STIX have emerged as de facto standards within the CTI
community, emphasizing the formal categorization of entities and relations to
facilitate consistent data sharing. In this paper, we introduce AGIR (Automatic
Generation of Intelligence Reports), a transformative Natural Language
Generation tool specifically designed to address the pressing challenges in the
realm of CTI reporting. AGIR's primary objective is to empower security
analysts by automating the labor-intensive task of generating comprehensive
intelligence reports from formal representations of entity graphs. AGIR
utilizes a two-stage pipeline by combining the advantages of template-based
approaches and the capabilities of Large Language Models such as ChatGPT. We
evaluate AGIR's report generation capabilities both quantitatively and
qualitatively. The generated reports accurately convey information expressed
through formal language, achieving a high recall value (0.99) without
introducing hallucination. Furthermore, we compare the fluency and utility of
the reports with state-of-the-art approaches, showing how AGIR achieves higher
scores in terms of Syntactic Log-Odds Ratio (SLOR) and through questionnaires.
By using our tool, we estimate that the report writing time is reduced by more
than 40%, therefore streamlining the CTI production of any organization and
contributing to the automation of several CTI tasks.
Related papers
- EICopilot: Search and Explore Enterprise Information over Large-scale Knowledge Graphs with LLM-driven Agents [16.65035686422735]
The paper introduces EICopilot, a novel agent-based solution enhancing search and exploration of enterprise registration data within online knowledge graphs.
The solution automatically generates and executes Gremlin scripts, providing efficient summaries of complex enterprise relationships.
Empirical evaluations demonstrate the superior performance of EICopilot, including speed and accuracy, over baseline methods.
arXiv Detail & Related papers (2025-01-23T15:22:25Z) - Towards a scalable AI-driven framework for data-independent Cyber Threat Intelligence Information Extraction [0.0]
This paper introduces 0-CTI, a scalable AI-based framework designed for efficient CTI Information Extraction.
The proposed system processes complete text sequences of CTI reports to extract a cyber ontology of named entities and their relationships.
Our contribution is the development of 0-CTI, the first modular framework for CTI Information Extraction that supports both supervised and zero-shot learning.
arXiv Detail & Related papers (2025-01-08T12:35:17Z) - CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity [49.657358248788945]
Textual descriptions in cyber threat intelligence (CTI) reports are rich sources of knowledge about cyber threats.
Current CTI extraction methods lack flexibility and generalizability, often resulting in inaccurate and incomplete knowledge extraction.
We propose CTINexus, a novel framework leveraging optimized in-context learning (ICL) of large language models.
arXiv Detail & Related papers (2024-10-28T14:18:32Z) - Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation [2.9921619703037274]
We propose a retrieval augmented generation (RAG) framework backed by a large language model (LLM) to correct the output of a smaller model for the linguistic task of morphological glossing.
We leverage linguistic information to make up for the lack of data and trainable parameters, while allowing for inputs from written descriptive grammars interpreted and distilled through an LLM.
We show that a compact, RAG-supported model is highly effective in data-scarce settings, achieving a new state-of-the-art for this task and our target languages.
arXiv Detail & Related papers (2024-10-01T04:20:14Z) - Agent-driven Generative Semantic Communication with Cross-Modality and Prediction [57.335922373309074]
We propose a novel agent-driven generative semantic communication framework based on reinforcement learning.
In this work, we develop an agent-assisted semantic encoder with cross-modality capability, which can track the semantic changes, channel condition, to perform adaptive semantic extraction and sampling.
The effectiveness of the designed models has been verified using the UA-DETRAC dataset, demonstrating the performance gains of the overall A-GSC framework.
arXiv Detail & Related papers (2024-04-10T13:24:27Z) - Exploiting Contextual Target Attributes for Target Sentiment
Classification [53.30511968323911]
Existing PTLM-based models for TSC can be categorized into two groups: 1) fine-tuning-based models that adopt PTLM as the context encoder; 2) prompting-based models that transfer the classification task to the text/word generation task.
We present a new perspective of leveraging PTLM for TSC: simultaneously leveraging the merits of both language modeling and explicit target-context interactions via contextual target attributes.
arXiv Detail & Related papers (2023-12-21T11:45:28Z) - Time for aCTIon: Automated Analysis of Cyber Threat Intelligence in the
Wild [2.4669630540735215]
Cyber Threat Intelligence (CTI) plays a crucial role in assessing risks and enhancing security for organizations.
Existing tools for automated structured CTI extraction have performance limitations.
We fill these gaps providing a new large open benchmark dataset and aCTIon, a structured CTI information extraction tool.
arXiv Detail & Related papers (2023-07-14T13:43:16Z) - AutoTriggER: Label-Efficient and Robust Named Entity Recognition with
Auxiliary Trigger Extraction [54.20039200180071]
We present a novel framework to improve NER performance by automatically generating and leveraging entity triggers''
Our framework leverages post-hoc explanation to generate rationales and strengthens a model's prior knowledge using an embedding technique.
AutoTriggER shows strong label-efficiency, is capable of generalizing to unseen entities, and outperforms the RoBERTa-CRF baseline by nearly 0.5 F1 points on average.
arXiv Detail & Related papers (2021-09-10T08:11:56Z) - InfoBERT: Improving Robustness of Language Models from An Information
Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks.
We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z) - Exploring Software Naturalness through Neural Language Models [56.1315223210742]
The Software Naturalness hypothesis argues that programming languages can be understood through the same techniques used in natural language processing.
We explore this hypothesis through the use of a pre-trained transformer-based language model to perform code analysis tasks.
arXiv Detail & Related papers (2020-06-22T21:56:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.