AGIR: Automating Cyber Threat Intelligence Reporting with Natural
Language Generation
- URL: http://arxiv.org/abs/2310.02655v1
- Date: Wed, 4 Oct 2023 08:25:37 GMT
- Title: AGIR: Automating Cyber Threat Intelligence Reporting with Natural
Language Generation
- Authors: Filippo Perrina, Francesco Marchiori, Mauro Conti, Nino Vincenzo Verde
- Abstract summary: We introduce AGIR (Automatic Generation of Intelligence Reports), a transformative tool for CTI reporting.
AGIR's primary objective is to empower security analysts by automating the labor-intensive task of generating comprehensive intelligence reports.
We evaluate AGIR's report generation capabilities both quantitatively and qualitatively.
- Score: 15.43868945929965
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Cyber Threat Intelligence (CTI) reporting is pivotal in contemporary risk
management strategies. As the volume of CTI reports continues to surge, the
demand for automated tools to streamline report generation becomes increasingly
apparent. While Natural Language Processing techniques have shown potential in
handling text data, they often struggle to address the complexity of diverse
data sources and their intricate interrelationships. Moreover, established
paradigms like STIX have emerged as de facto standards within the CTI
community, emphasizing the formal categorization of entities and relations to
facilitate consistent data sharing. In this paper, we introduce AGIR (Automatic
Generation of Intelligence Reports), a transformative Natural Language
Generation tool specifically designed to address the pressing challenges in the
realm of CTI reporting. AGIR's primary objective is to empower security
analysts by automating the labor-intensive task of generating comprehensive
intelligence reports from formal representations of entity graphs. AGIR
utilizes a two-stage pipeline by combining the advantages of template-based
approaches and the capabilities of Large Language Models such as ChatGPT. We
evaluate AGIR's report generation capabilities both quantitatively and
qualitatively. The generated reports accurately convey information expressed
through formal language, achieving a high recall value (0.99) without
introducing hallucination. Furthermore, we compare the fluency and utility of
the reports with state-of-the-art approaches, showing how AGIR achieves higher
scores in terms of Syntactic Log-Odds Ratio (SLOR) and through questionnaires.
By using our tool, we estimate that the report writing time is reduced by more
than 40%, therefore streamlining the CTI production of any organization and
contributing to the automation of several CTI tasks.
Related papers
- CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity [49.657358248788945]
Textual descriptions in cyber threat intelligence (CTI) reports are rich sources of knowledge about cyber threats.
Current CTI extraction methods lack flexibility and generalizability, often resulting in inaccurate and incomplete knowledge extraction.
We propose CTINexus, a novel framework leveraging optimized in-context learning (ICL) of large language models.
arXiv Detail & Related papers (2024-10-28T14:18:32Z) - Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation [2.9921619703037274]
We propose a retrieval augmented generation (RAG) framework backed by a large language model (LLM) to correct the output of a smaller model for the linguistic task of morphological glossing.
We leverage linguistic information to make up for the lack of data and trainable parameters, while allowing for inputs from written descriptive grammars interpreted and distilled through an LLM.
We show that a compact, RAG-supported model is highly effective in data-scarce settings, achieving a new state-of-the-art for this task and our target languages.
arXiv Detail & Related papers (2024-10-01T04:20:14Z) - Actionable Cyber Threat Intelligence using Knowledge Graphs and Large Language Models [0.8192907805418583]
Microsoft, Trend Micro, and CrowdStrike are using generative AI to facilitate CTI extraction.
This paper addresses the challenge of automating the extraction of actionable CTI using advancements in Large Language Models (LLMs) and Knowledge Graphs (KGs)
Our methodology evaluates techniques such as prompt engineering, the guidance framework, and fine-tuning to optimize information extraction and structuring.
Experimental results demonstrate the effectiveness of our approach in extracting relevant information, with guidance and fine-tuning showing superior performance over prompt engineering.
arXiv Detail & Related papers (2024-06-30T13:02:03Z) - Agent-driven Generative Semantic Communication with Cross-Modality and Prediction [57.335922373309074]
We propose a novel agent-driven generative semantic communication framework based on reinforcement learning.
In this work, we develop an agent-assisted semantic encoder with cross-modality capability, which can track the semantic changes, channel condition, to perform adaptive semantic extraction and sampling.
The effectiveness of the designed models has been verified using the UA-DETRAC dataset, demonstrating the performance gains of the overall A-GSC framework.
arXiv Detail & Related papers (2024-04-10T13:24:27Z) - TTPXHunter: Actionable Threat Intelligence Extraction as TTPs from Finished Cyber Threat Reports [3.2183320563774833]
Knowing the modus operandi of adversaries aids organizations in employing efficient defensive strategies and sharing intelligence in the community.
A translation tool is needed to interpret the modus operandi explained in the sentences of the threat report and translate it into a structured format.
This research introduces a methodology named TTPXHunter for the automated extraction of threat intelligence in terms of Tactics, Techniques, and Procedures (TTPs) from finished cyber threat reports.
arXiv Detail & Related papers (2024-03-05T19:04:09Z) - Exploiting Contextual Target Attributes for Target Sentiment
Classification [53.30511968323911]
Existing PTLM-based models for TSC can be categorized into two groups: 1) fine-tuning-based models that adopt PTLM as the context encoder; 2) prompting-based models that transfer the classification task to the text/word generation task.
We present a new perspective of leveraging PTLM for TSC: simultaneously leveraging the merits of both language modeling and explicit target-context interactions via contextual target attributes.
arXiv Detail & Related papers (2023-12-21T11:45:28Z) - Time for aCTIon: Automated Analysis of Cyber Threat Intelligence in the
Wild [2.4669630540735215]
Cyber Threat Intelligence (CTI) plays a crucial role in assessing risks and enhancing security for organizations.
Existing tools for automated structured CTI extraction have performance limitations.
We fill these gaps providing a new large open benchmark dataset and aCTIon, a structured CTI information extraction tool.
arXiv Detail & Related papers (2023-07-14T13:43:16Z) - AutoTriggER: Label-Efficient and Robust Named Entity Recognition with
Auxiliary Trigger Extraction [54.20039200180071]
We present a novel framework to improve NER performance by automatically generating and leveraging entity triggers''
Our framework leverages post-hoc explanation to generate rationales and strengthens a model's prior knowledge using an embedding technique.
AutoTriggER shows strong label-efficiency, is capable of generalizing to unseen entities, and outperforms the RoBERTa-CRF baseline by nearly 0.5 F1 points on average.
arXiv Detail & Related papers (2021-09-10T08:11:56Z) - InfoBERT: Improving Robustness of Language Models from An Information
Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks.
We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z) - Exploring Software Naturalness through Neural Language Models [56.1315223210742]
The Software Naturalness hypothesis argues that programming languages can be understood through the same techniques used in natural language processing.
We explore this hypothesis through the use of a pre-trained transformer-based language model to perform code analysis tasks.
arXiv Detail & Related papers (2020-06-22T21:56:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.