Related papers: AGIR: Automating Cyber Threat Intelligence Reporting with Natural Language Generation

AGIR: Automating Cyber Threat Intelligence Reporting with Natural Language Generation

URL: http://arxiv.org/abs/2310.02655v1
Date: Wed, 4 Oct 2023 08:25:37 GMT
Title: AGIR: Automating Cyber Threat Intelligence Reporting with Natural Language Generation
Authors: Filippo Perrina, Francesco Marchiori, Mauro Conti, Nino Vincenzo Verde
Abstract summary: We introduce AGIR (Automatic Generation of Intelligence Reports), a transformative tool for CTI reporting. AGIR's primary objective is to empower security analysts by automating the labor-intensive task of generating comprehensive intelligence reports. We evaluate AGIR's report generation capabilities both quantitatively and qualitatively.
Score: 15.43868945929965
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Cyber Threat Intelligence (CTI) reporting is pivotal in contemporary risk management strategies. As the volume of CTI reports continues to surge, the demand for automated tools to streamline report generation becomes increasingly apparent. While Natural Language Processing techniques have shown potential in handling text data, they often struggle to address the complexity of diverse data sources and their intricate interrelationships. Moreover, established paradigms like STIX have emerged as de facto standards within the CTI community, emphasizing the formal categorization of entities and relations to facilitate consistent data sharing. In this paper, we introduce AGIR (Automatic Generation of Intelligence Reports), a transformative Natural Language Generation tool specifically designed to address the pressing challenges in the realm of CTI reporting. AGIR's primary objective is to empower security analysts by automating the labor-intensive task of generating comprehensive intelligence reports from formal representations of entity graphs. AGIR utilizes a two-stage pipeline by combining the advantages of template-based approaches and the capabilities of Large Language Models such as ChatGPT. We evaluate AGIR's report generation capabilities both quantitatively and qualitatively. The generated reports accurately convey information expressed through formal language, achieving a high recall value (0.99) without introducing hallucination. Furthermore, we compare the fluency and utility of the reports with state-of-the-art approaches, showing how AGIR achieves higher scores in terms of Syntactic Log-Odds Ratio (SLOR) and through questionnaires. By using our tool, we estimate that the report writing time is reduced by more than 40%, therefore streamlining the CTI production of any organization and contributing to the automation of several CTI tasks.

Related papers

CTI-HAL: A Human-Annotated Dataset for Cyber Threat Intelligence Analysis [2.7862108332002546]
Cyber Threat Intelligence (CTI) sources are often unstructured and in natural language, making it difficult to automatically extract information. Recent studies have explored the use of AI to perform automatic extraction from CTI data. We introduce a novel dataset manually constructed from CTI reports and structured according to the MITRE ATT&CK framework.
arXiv Detail & Related papers (2025-04-08T09:47:15Z)
EICopilot: Search and Explore Enterprise Information over Large-scale Knowledge Graphs with LLM-driven Agents [16.65035686422735]
The paper introduces EICopilot, a novel agent-based solution enhancing search and exploration of enterprise registration data within online knowledge graphs. The solution automatically generates and executes Gremlin scripts, providing efficient summaries of complex enterprise relationships. Empirical evaluations demonstrate the superior performance of EICopilot, including speed and accuracy, over baseline methods.
arXiv Detail & Related papers (2025-01-23T15:22:25Z)
Towards a scalable AI-driven framework for data-independent Cyber Threat Intelligence Information Extraction [0.0]
This paper introduces 0-CTI, a scalable AI-based framework designed for efficient CTI Information Extraction. The proposed system processes complete text sequences of CTI reports to extract a cyber ontology of named entities and their relationships. Our contribution is the development of 0-CTI, the first modular framework for CTI Information Extraction that supports both supervised and zero-shot learning.
arXiv Detail & Related papers (2025-01-08T12:35:17Z)
CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity [49.657358248788945]
Textual descriptions in cyber threat intelligence (CTI) reports are rich sources of knowledge about cyber threats. Current CTI extraction methods lack flexibility and generalizability, often resulting in inaccurate and incomplete knowledge extraction. We propose CTINexus, a novel framework leveraging optimized in-context learning (ICL) of large language models.
arXiv Detail & Related papers (2024-10-28T14:18:32Z)
Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation [2.9921619703037274]
We propose a retrieval augmented generation (RAG) framework backed by a large language model (LLM) to correct the output of a smaller model for the linguistic task of morphological glossing. We leverage linguistic information to make up for the lack of data and trainable parameters, while allowing for inputs from written descriptive grammars interpreted and distilled through an LLM. We show that a compact, RAG-supported model is highly effective in data-scarce settings, achieving a new state-of-the-art for this task and our target languages.
arXiv Detail & Related papers (2024-10-01T04:20:14Z)
Actionable Cyber Threat Intelligence using Knowledge Graphs and Large Language Models [0.8192907805418583]
Microsoft, Trend Micro, and CrowdStrike are using generative AI to facilitate CTI extraction. This paper addresses the challenge of automating the extraction of actionable CTI using advancements in Large Language Models (LLMs) and Knowledge Graphs (KGs) Our methodology evaluates techniques such as prompt engineering, the guidance framework, and fine-tuning to optimize information extraction and structuring. Experimental results demonstrate the effectiveness of our approach in extracting relevant information, with guidance and fine-tuning showing superior performance over prompt engineering.
arXiv Detail & Related papers (2024-06-30T13:02:03Z)
Agent-driven Generative Semantic Communication with Cross-Modality and Prediction [57.335922373309074]
We propose a novel agent-driven generative semantic communication framework based on reinforcement learning. In this work, we develop an agent-assisted semantic encoder with cross-modality capability, which can track the semantic changes, channel condition, to perform adaptive semantic extraction and sampling. The effectiveness of the designed models has been verified using the UA-DETRAC dataset, demonstrating the performance gains of the overall A-GSC framework.
arXiv Detail & Related papers (2024-04-10T13:24:27Z)
TTPXHunter: Actionable Threat Intelligence Extraction as TTPs from Finished Cyber Threat Reports [3.2183320563774833]
Knowing the modus operandi of adversaries aids organizations in employing efficient defensive strategies and sharing intelligence in the community. A translation tool is needed to interpret the modus operandi explained in the sentences of the threat report and translate it into a structured format. This research introduces a methodology named TTPXHunter for the automated extraction of threat intelligence in terms of Tactics, Techniques, and Procedures (TTPs) from finished cyber threat reports.
arXiv Detail & Related papers (2024-03-05T19:04:09Z)
Exploiting Contextual Target Attributes for Target Sentiment Classification [53.30511968323911]
Existing PTLM-based models for TSC can be categorized into two groups: 1) fine-tuning-based models that adopt PTLM as the context encoder; 2) prompting-based models that transfer the classification task to the text/word generation task. We present a new perspective of leveraging PTLM for TSC: simultaneously leveraging the merits of both language modeling and explicit target-context interactions via contextual target attributes.
arXiv Detail & Related papers (2023-12-21T11:45:28Z)
Time for aCTIon: Automated Analysis of Cyber Threat Intelligence in the Wild [2.4669630540735215]
Cyber Threat Intelligence (CTI) plays a crucial role in assessing risks and enhancing security for organizations. Existing tools for automated structured CTI extraction have performance limitations. We fill these gaps providing a new large open benchmark dataset and aCTIon, a structured CTI information extraction tool.
arXiv Detail & Related papers (2023-07-14T13:43:16Z)
AutoTriggER: Label-Efficient and Robust Named Entity Recognition with Auxiliary Trigger Extraction [54.20039200180071]
We present a novel framework to improve NER performance by automatically generating and leveraging entity triggers'' Our framework leverages post-hoc explanation to generate rationales and strengthens a model's prior knowledge using an embedding technique. AutoTriggER shows strong label-efficiency, is capable of generalizing to unseen entities, and outperforms the RoBERTa-CRF baseline by nearly 0.5 F1 points on average.
arXiv Detail & Related papers (2021-09-10T08:11:56Z)
InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks. Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks. We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z)
Exploring Software Naturalness through Neural Language Models [56.1315223210742]
The Software Naturalness hypothesis argues that programming languages can be understood through the same techniques used in natural language processing. We explore this hypothesis through the use of a pre-trained transformer-based language model to perform code analysis tasks.
arXiv Detail & Related papers (2020-06-22T21:56:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.