Time for aCTIon: Automated Analysis of Cyber Threat Intelligence in the
Wild
- URL: http://arxiv.org/abs/2307.10214v1
- Date: Fri, 14 Jul 2023 13:43:16 GMT
- Title: Time for aCTIon: Automated Analysis of Cyber Threat Intelligence in the
Wild
- Authors: Giuseppe Siracusano, Davide Sanvito, Roberto Gonzalez, Manikantan
Srinivasan, Sivakaman Kamatchi, Wataru Takahashi, Masaru Kawakita, Takahiro
Kakumaru, Roberto Bifulco
- Abstract summary: Cyber Threat Intelligence (CTI) plays a crucial role in assessing risks and enhancing security for organizations.
Existing tools for automated structured CTI extraction have performance limitations.
We fill these gaps providing a new large open benchmark dataset and aCTIon, a structured CTI information extraction tool.
- Score: 2.4669630540735215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cyber Threat Intelligence (CTI) plays a crucial role in assessing risks and
enhancing security for organizations. However, the process of extracting
relevant information from unstructured text sources can be expensive and
time-consuming. Our empirical experience shows that existing tools for
automated structured CTI extraction have performance limitations. Furthermore,
the community lacks a common benchmark to quantitatively assess their
performance. We fill these gaps providing a new large open benchmark dataset
and aCTIon, a structured CTI information extraction tool. The dataset includes
204 real-world publicly available reports and their corresponding structured
CTI information in STIX format. Our team curated the dataset involving three
independent groups of CTI analysts working over the course of several months.
To the best of our knowledge, this dataset is two orders of magnitude larger
than previously released open source datasets. We then design aCTIon,
leveraging recently introduced large language models (GPT3.5) in the context of
two custom information extraction pipelines. We compare our method with 10
solutions presented in previous work, for which we develop our own
implementations when open-source implementations were lacking. Our results show
that aCTIon outperforms previous work for structured CTI extraction with an
improvement of the F1-score from 10%points to 50%points across all tasks.
Related papers
- DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - Actionable Cyber Threat Intelligence using Knowledge Graphs and Large Language Models [0.8192907805418583]
Microsoft, Trend Micro, and CrowdStrike are using generative AI to facilitate CTI extraction.
This paper addresses the challenge of automating the extraction of actionable CTI using advancements in Large Language Models (LLMs) and Knowledge Graphs (KGs)
Our methodology evaluates techniques such as prompt engineering, the guidance framework, and fine-tuning to optimize information extraction and structuring.
Experimental results demonstrate the effectiveness of our approach in extracting relevant information, with guidance and fine-tuning showing superior performance over prompt engineering.
arXiv Detail & Related papers (2024-06-30T13:02:03Z) - DataAgent: Evaluating Large Language Models' Ability to Answer Zero-Shot, Natural Language Queries [0.0]
We evaluate OpenAI's GPT-3.5 as a "Language Data Scientist" (LDS)
The model was tested on a diverse set of benchmark datasets to evaluate its performance across multiple standards.
arXiv Detail & Related papers (2024-03-29T22:59:34Z) - Automated Contrastive Learning Strategy Search for Time Series [48.68664732145665]
Contrastive Learning (CL) has become a predominant representation learning paradigm for time series.
We present an Automated Machine Learning (AutoML) practice at Microsoft, which automatically learns to contrast learn representations for various time series datasets.
arXiv Detail & Related papers (2024-03-19T11:24:14Z) - On the Cross-Dataset Generalization of Machine Learning for Network
Intrusion Detection [50.38534263407915]
Network Intrusion Detection Systems (NIDS) are a fundamental tool in cybersecurity.
Their ability to generalize across diverse networks is a critical factor in their effectiveness and a prerequisite for real-world applications.
In this study, we conduct a comprehensive analysis on the generalization of machine-learning-based NIDS through an extensive experimentation in a cross-dataset framework.
arXiv Detail & Related papers (2024-02-15T14:39:58Z) - TSTEM: A Cognitive Platform for Collecting Cyber Threat Intelligence in the Wild [0.06597195879147556]
The extraction of cyber threat intelligence (CTI) from open sources is a rapidly expanding defensive strategy.
Previous research has focused on improving individual components of the extraction process.
The community lacks open-source platforms for deploying streaming CTI data pipelines in the wild.
arXiv Detail & Related papers (2024-02-15T14:29:21Z) - Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction.
We reformulate the task to be entity-centric, enabling the use of diverse metrics that can provide more insights.
We introduce a new model that harnesses the power of Language Models (LMs) for enhanced effectiveness and efficiency.
arXiv Detail & Related papers (2024-02-06T22:15:09Z) - AGIR: Automating Cyber Threat Intelligence Reporting with Natural
Language Generation [15.43868945929965]
We introduce AGIR (Automatic Generation of Intelligence Reports), a transformative tool for CTI reporting.
AGIR's primary objective is to empower security analysts by automating the labor-intensive task of generating comprehensive intelligence reports.
We evaluate AGIR's report generation capabilities both quantitatively and qualitatively.
arXiv Detail & Related papers (2023-10-04T08:25:37Z) - Information Association for Language Model Updating by Mitigating
LM-Logical Discrepancy [68.31760483418901]
Large Language Models(LLMs) struggle with providing current information due to the outdated pre-training data.
Existing methods for updating LLMs, such as knowledge editing and continual fine-tuning, have significant drawbacks in generalizability of new information.
We identify the core challenge behind these drawbacks: the LM-logical discrepancy featuring the difference between language modeling probabilities and logical probabilities.
arXiv Detail & Related papers (2023-05-29T19:48:37Z) - STAR: Boosting Low-Resource Information Extraction by Structure-to-Text
Data Generation with Large Language Models [56.27786433792638]
STAR is a data generation method that leverages Large Language Models (LLMs) to synthesize data instances.
We design fine-grained step-by-step instructions to obtain the initial data instances.
Our experiments show that the data generated by STAR significantly improve the performance of low-resource event extraction and relation extraction tasks.
arXiv Detail & Related papers (2023-05-24T12:15:19Z) - Cognitive Computing to Optimize IT Services [0.0]
A Cognitive solution goes beyond the traditional structured data analysis by deep analyses of both structured and unstructured text.
In experiments, upto 18-25% of yearly ticket volume has been reduced using the proposed approach.
arXiv Detail & Related papers (2021-12-28T09:56:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.