Time for aCTIon: Automated Analysis of Cyber Threat Intelligence in the
Wild
- URL: http://arxiv.org/abs/2307.10214v1
- Date: Fri, 14 Jul 2023 13:43:16 GMT
- Title: Time for aCTIon: Automated Analysis of Cyber Threat Intelligence in the
Wild
- Authors: Giuseppe Siracusano, Davide Sanvito, Roberto Gonzalez, Manikantan
Srinivasan, Sivakaman Kamatchi, Wataru Takahashi, Masaru Kawakita, Takahiro
Kakumaru, Roberto Bifulco
- Abstract summary: Cyber Threat Intelligence (CTI) plays a crucial role in assessing risks and enhancing security for organizations.
Existing tools for automated structured CTI extraction have performance limitations.
We fill these gaps providing a new large open benchmark dataset and aCTIon, a structured CTI information extraction tool.
- Score: 2.4669630540735215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cyber Threat Intelligence (CTI) plays a crucial role in assessing risks and
enhancing security for organizations. However, the process of extracting
relevant information from unstructured text sources can be expensive and
time-consuming. Our empirical experience shows that existing tools for
automated structured CTI extraction have performance limitations. Furthermore,
the community lacks a common benchmark to quantitatively assess their
performance. We fill these gaps providing a new large open benchmark dataset
and aCTIon, a structured CTI information extraction tool. The dataset includes
204 real-world publicly available reports and their corresponding structured
CTI information in STIX format. Our team curated the dataset involving three
independent groups of CTI analysts working over the course of several months.
To the best of our knowledge, this dataset is two orders of magnitude larger
than previously released open source datasets. We then design aCTIon,
leveraging recently introduced large language models (GPT3.5) in the context of
two custom information extraction pipelines. We compare our method with 10
solutions presented in previous work, for which we develop our own
implementations when open-source implementations were lacking. Our results show
that aCTIon outperforms previous work for structured CTI extraction with an
improvement of the F1-score from 10%points to 50%points across all tasks.
Related papers
- CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity [49.657358248788945]
Textual descriptions in cyber threat intelligence (CTI) reports are rich sources of knowledge about cyber threats.
Current CTI extraction methods lack flexibility and generalizability, often resulting in inaccurate and incomplete knowledge extraction.
We propose CTINexus, a novel framework leveraging optimized in-context learning (ICL) of large language models.
arXiv Detail & Related papers (2024-10-28T14:18:32Z) - CTISum: A New Benchmark Dataset For Cyber Threat Intelligence Summarization [14.287652216484863]
We present CTISum, a new benchmark for CTI summarization task.
Considering the importance of attack process, a novel fine-grained subtask of attack process summarization is proposed.
arXiv Detail & Related papers (2024-08-13T02:25:16Z) - DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - Actionable Cyber Threat Intelligence using Knowledge Graphs and Large Language Models [0.8192907805418583]
Microsoft, Trend Micro, and CrowdStrike are using generative AI to facilitate CTI extraction.
This paper addresses the challenge of automating the extraction of actionable CTI using advancements in Large Language Models (LLMs) and Knowledge Graphs (KGs)
Our methodology evaluates techniques such as prompt engineering, the guidance framework, and fine-tuning to optimize information extraction and structuring.
Experimental results demonstrate the effectiveness of our approach in extracting relevant information, with guidance and fine-tuning showing superior performance over prompt engineering.
arXiv Detail & Related papers (2024-06-30T13:02:03Z) - Automated Contrastive Learning Strategy Search for Time Series [48.68664732145665]
We present an Automated Machine Learning (AutoML) practice at Microsoft, which automatically learns Contrastive Learning (AutoCL) for time series datasets and tasks.
We first construct a principled search space of size over $3times1012$, covering data augmentation, embedding transformation, contrastive pair construction, and contrastive losses.
Further, we introduce an efficient reinforcement learning algorithm, which optimize CLS from the performance on the validation tasks, to obtain effective CLS within the space.
arXiv Detail & Related papers (2024-03-19T11:24:14Z) - On the Cross-Dataset Generalization of Machine Learning for Network
Intrusion Detection [50.38534263407915]
Network Intrusion Detection Systems (NIDS) are a fundamental tool in cybersecurity.
Their ability to generalize across diverse networks is a critical factor in their effectiveness and a prerequisite for real-world applications.
In this study, we conduct a comprehensive analysis on the generalization of machine-learning-based NIDS through an extensive experimentation in a cross-dataset framework.
arXiv Detail & Related papers (2024-02-15T14:39:58Z) - TSTEM: A Cognitive Platform for Collecting Cyber Threat Intelligence in the Wild [0.06597195879147556]
The extraction of cyber threat intelligence (CTI) from open sources is a rapidly expanding defensive strategy.
Previous research has focused on improving individual components of the extraction process.
The community lacks open-source platforms for deploying streaming CTI data pipelines in the wild.
arXiv Detail & Related papers (2024-02-15T14:29:21Z) - Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction.
We reformulate the task to be entity-centric, enabling the use of diverse metrics.
We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z) - STAR: Boosting Low-Resource Information Extraction by Structure-to-Text
Data Generation with Large Language Models [56.27786433792638]
STAR is a data generation method that leverages Large Language Models (LLMs) to synthesize data instances.
We design fine-grained step-by-step instructions to obtain the initial data instances.
Our experiments show that the data generated by STAR significantly improve the performance of low-resource event extraction and relation extraction tasks.
arXiv Detail & Related papers (2023-05-24T12:15:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.