Related papers: Time for aCTIon: Automated Analysis of Cyber Threat Intelligence in the Wild

Time for aCTIon: Automated Analysis of Cyber Threat Intelligence in the Wild

URL: http://arxiv.org/abs/2307.10214v1
Date: Fri, 14 Jul 2023 13:43:16 GMT
Title: Time for aCTIon: Automated Analysis of Cyber Threat Intelligence in the Wild
Authors: Giuseppe Siracusano, Davide Sanvito, Roberto Gonzalez, Manikantan Srinivasan, Sivakaman Kamatchi, Wataru Takahashi, Masaru Kawakita, Takahiro Kakumaru, Roberto Bifulco
Abstract summary: Cyber Threat Intelligence (CTI) plays a crucial role in assessing risks and enhancing security for organizations. Existing tools for automated structured CTI extraction have performance limitations. We fill these gaps providing a new large open benchmark dataset and aCTIon, a structured CTI information extraction tool.
Score: 2.4669630540735215
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cyber Threat Intelligence (CTI) plays a crucial role in assessing risks and enhancing security for organizations. However, the process of extracting relevant information from unstructured text sources can be expensive and time-consuming. Our empirical experience shows that existing tools for automated structured CTI extraction have performance limitations. Furthermore, the community lacks a common benchmark to quantitatively assess their performance. We fill these gaps providing a new large open benchmark dataset and aCTIon, a structured CTI information extraction tool. The dataset includes 204 real-world publicly available reports and their corresponding structured CTI information in STIX format. Our team curated the dataset involving three independent groups of CTI analysts working over the course of several months. To the best of our knowledge, this dataset is two orders of magnitude larger than previously released open source datasets. We then design aCTIon, leveraging recently introduced large language models (GPT3.5) in the context of two custom information extraction pipelines. We compare our method with 10 solutions presented in previous work, for which we develop our own implementations when open-source implementations were lacking. Our results show that aCTIon outperforms previous work for structured CTI extraction with an improvement of the F1-score from 10%points to 50%points across all tasks.

Related papers

CTI-HAL: A Human-Annotated Dataset for Cyber Threat Intelligence Analysis [2.7862108332002546]
Cyber Threat Intelligence (CTI) sources are often unstructured and in natural language, making it difficult to automatically extract information. Recent studies have explored the use of AI to perform automatic extraction from CTI data. We introduce a novel dataset manually constructed from CTI reports and structured according to the MITRE ATT&CK framework.
arXiv Detail & Related papers (2025-04-08T09:47:15Z)
Towards a scalable AI-driven framework for data-independent Cyber Threat Intelligence Information Extraction [0.0]
This paper introduces 0-CTI, a scalable AI-based framework designed for efficient CTI Information Extraction. The proposed system processes complete text sequences of CTI reports to extract a cyber ontology of named entities and their relationships. Our contribution is the development of 0-CTI, the first modular framework for CTI Information Extraction that supports both supervised and zero-shot learning.
arXiv Detail & Related papers (2025-01-08T12:35:17Z)
CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity [49.657358248788945]
Textual descriptions in cyber threat intelligence (CTI) reports are rich sources of knowledge about cyber threats. Current CTI extraction methods lack flexibility and generalizability, often resulting in inaccurate and incomplete knowledge extraction. We propose CTINexus, a novel framework leveraging optimized in-context learning (ICL) of large language models.
arXiv Detail & Related papers (2024-10-28T14:18:32Z)
Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context. We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters. We propose a simple yet effective LLM prompting method that outperforms all other tested methods on our benchmark.
arXiv Detail & Related papers (2024-10-24T17:56:08Z)
CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation [51.2289822267563]
We propose a method for generating synthetic datasets, given a small number of user-written few-shots that demonstrate the task to be performed.<n>We demonstrate that CRAFT can efficiently generate large-scale task-specific training datasets for four diverse tasks: biology, medicine, and commonsense question-answering (QA)<n>Our experiments show that CRAFT-based models outperform or match general LLMs on QA tasks, while exceeding models trained on human-curated summarization data by 46 preference points.
arXiv Detail & Related papers (2024-09-03T17:54:40Z)
CTISum: A New Benchmark Dataset For Cyber Threat Intelligence Summarization [14.287652216484863]
We present CTISum, a new benchmark for CTI summarization task. Considering the importance of attack process, a novel fine-grained subtask of attack process summarization is proposed.
arXiv Detail & Related papers (2024-08-13T02:25:16Z)
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering. Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z)
Actionable Cyber Threat Intelligence using Knowledge Graphs and Large Language Models [0.8192907805418583]
Microsoft, Trend Micro, and CrowdStrike are using generative AI to facilitate CTI extraction. This paper addresses the challenge of automating the extraction of actionable CTI using advancements in Large Language Models (LLMs) and Knowledge Graphs (KGs) Our methodology evaluates techniques such as prompt engineering, the guidance framework, and fine-tuning to optimize information extraction and structuring. Experimental results demonstrate the effectiveness of our approach in extracting relevant information, with guidance and fine-tuning showing superior performance over prompt engineering.
arXiv Detail & Related papers (2024-06-30T13:02:03Z)
Automated Contrastive Learning Strategy Search for Time Series [48.68664732145665]
We present an Automated Machine Learning (AutoML) practice at Microsoft, which automatically learns Contrastive Learning (AutoCL) for time series datasets and tasks. We first construct a principled search space of size over $3times1012$, covering data augmentation, embedding transformation, contrastive pair construction, and contrastive losses. Further, we introduce an efficient reinforcement learning algorithm, which optimize CLS from the performance on the validation tasks, to obtain effective CLS within the space.
arXiv Detail & Related papers (2024-03-19T11:24:14Z)
On the Cross-Dataset Generalization of Machine Learning for Network Intrusion Detection [50.38534263407915]
Network Intrusion Detection Systems (NIDS) are a fundamental tool in cybersecurity. Their ability to generalize across diverse networks is a critical factor in their effectiveness and a prerequisite for real-world applications. In this study, we conduct a comprehensive analysis on the generalization of machine-learning-based NIDS through an extensive experimentation in a cross-dataset framework.
arXiv Detail & Related papers (2024-02-15T14:39:58Z)
TSTEM: A Cognitive Platform for Collecting Cyber Threat Intelligence in the Wild [0.06597195879147556]
The extraction of cyber threat intelligence (CTI) from open sources is a rapidly expanding defensive strategy. Previous research has focused on improving individual components of the extraction process. The community lacks open-source platforms for deploying streaming CTI data pipelines in the wild.
arXiv Detail & Related papers (2024-02-15T14:29:21Z)
Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction. We reformulate the task to be entity-centric, enabling the use of diverse metrics. We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z)
STAR: Boosting Low-Resource Information Extraction by Structure-to-Text Data Generation with Large Language Models [56.27786433792638]
STAR is a data generation method that leverages Large Language Models (LLMs) to synthesize data instances. We design fine-grained step-by-step instructions to obtain the initial data instances. Our experiments show that the data generated by STAR significantly improve the performance of low-resource event extraction and relation extraction tasks.
arXiv Detail & Related papers (2023-05-24T12:15:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.