GenDFIR: Advancing Cyber Incident Timeline Analysis Through Retrieval Augmented Generation and Large Language Models
- URL: http://arxiv.org/abs/2409.02572v4
- Date: Fri, 27 Dec 2024 13:29:14 GMT
- Title: GenDFIR: Advancing Cyber Incident Timeline Analysis Through Retrieval Augmented Generation and Large Language Models
- Authors: Fatma Yasmine Loumachi, Mohamed Chahine Ghanem, Mohamed Amine Ferrag,
- Abstract summary: Cyber timeline analysis is crucial in Digital Forensics and Incident Response (DFIR)
Traditional methods rely on structured artefacts, such as logs and metadata, for evidence identification and feature extraction.
This paper introduces GenDFIR, a framework leveraging large language models (LLMs), specifically Llama 3.1 8B in zero shot mode, integrated with a Retrieval-Augmented Generation (RAG) agent.
- Score: 0.08192907805418582
- License:
- Abstract: Cyber timeline analysis, or forensic timeline analysis, is crucial in Digital Forensics and Incident Response (DFIR). It examines artefacts and events particularly timestamps and metadata to detect anomalies, establish correlations, and reconstruct incident timelines. Traditional methods rely on structured artefacts, such as logs and filesystem metadata, using specialised tools for evidence identification and feature extraction. This paper introduces GenDFIR, a framework leveraging large language models (LLMs), specifically Llama 3.1 8B in zero shot mode, integrated with a Retrieval-Augmented Generation (RAG) agent. Incident data is preprocessed into a structured knowledge base, enabling the RAG agent to retrieve relevant events based on user prompts. The LLM interprets this context, offering semantic enrichment. Tested on synthetic data in a controlled environment, results demonstrate GenDFIR's reliability and robustness, showcasing LLMs potential to automate timeline analysis and advance threat detection.
Related papers
- See it, Think it, Sorted: Large Multimodal Models are Few-shot Time Series Anomaly Analyzers [23.701716999879636]
Time series anomaly detection (TSAD) is becoming increasingly vital due to the rapid growth of time series data.
We introduce a pioneering framework called the Time Series Anomaly Multimodal Analyzer (TAMA) to enhance both the detection and interpretation of anomalies.
arXiv Detail & Related papers (2024-11-04T10:28:41Z) - Metadata Matters for Time Series: Informative Forecasting with Transformers [70.38241681764738]
We propose a Metadata-informed Time Series Transformer (MetaTST) for time series forecasting.
To tackle the unstructured nature of metadata, MetaTST formalizes them into natural languages by pre-designed templates.
A Transformer encoder is employed to communicate series and metadata tokens, which can extend series representations by metadata information.
arXiv Detail & Related papers (2024-10-04T11:37:55Z) - PeFAD: A Parameter-Efficient Federated Framework for Time Series Anomaly Detection [51.20479454379662]
We propose a.
Federated Anomaly Detection framework named PeFAD with the increasing privacy concerns.
We conduct extensive evaluations on four real datasets, where PeFAD outperforms existing state-of-the-art baselines by up to 28.74%.
arXiv Detail & Related papers (2024-06-04T13:51:08Z) - It Is Time To Steer: A Scalable Framework for Analysis-driven Attack Graph Generation [50.06412862964449]
Attack Graph (AG) represents the best-suited solution to support cyber risk assessment for multi-step attacks on computer networks.
Current solutions propose to address the generation problem from the algorithmic perspective and postulate the analysis only after the generation is complete.
This paper rethinks the classic AG analysis through a novel workflow in which the analyst can query the system anytime.
arXiv Detail & Related papers (2023-12-27T10:44:58Z) - TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations.
We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z) - RESAM: Requirements Elicitation and Specification for Deep-Learning
Anomaly Models with Applications to UAV Flight Controllers [24.033936757739617]
We present RESAM, a requirements process that integrates knowledge from domain experts, discussion forums, and formal product documentation.
We present a case-study based on a flight control system for small Uncrewed Aerial Systems and demonstrate that its use guides the construction of effective anomaly detection models.
arXiv Detail & Related papers (2022-07-18T18:09:59Z) - A Review of Open Source Software Tools for Time Series Analysis [0.0]
This paper describes a typical Time Series Analysis (TSA) framework with an architecture and lists the main features of TSA framework.
Overall, this article considered 60 time series analysis tools, and 32 of which provided forecasting modules, and 21 packages included anomaly detection.
arXiv Detail & Related papers (2022-03-10T07:12:20Z) - Anomaly Detection for Aggregated Data Using Multi-Graph Autoencoder [21.81622481466591]
We focus on creating an Anomaly detection models for system logs.
We present a thorough analysis of the aggregated data and the relationships between aggregated events.
We propose Multiple-graphs autoencoder MGAE, a novel convolutional graphs-autoencoder model.
arXiv Detail & Related papers (2021-01-11T17:38:42Z) - Learning summary features of time series for likelihood free inference [93.08098361687722]
We present a data-driven strategy for automatically learning summary features from time series data.
Our results indicate that learning summary features from data can compete and even outperform LFI methods based on hand-crafted values.
arXiv Detail & Related papers (2020-12-04T19:21:37Z) - A Causal-based Framework for Multimodal Multivariate Time Series
Validation Enhanced by Unsupervised Deep Learning as an Enabler for Industry
4.0 [0.0]
A conceptual validation framework for multi-level contextual anomaly detection is developed.
A Long Short-Term Memory Autoencoder is successfully evaluated to validate the learnt representation of contexts associated to multiple assets of a blast furnace.
A research roadmap is identified to combine causal discovery and representation learning as an enabler for unsupervised Root Cause Analysis applied to the process industry.
arXiv Detail & Related papers (2020-08-05T14:48:02Z) - Meta-learning framework with applications to zero-shot time-series
forecasting [82.61728230984099]
This work provides positive evidence using a broad meta-learning framework.
residual connections act as a meta-learning adaptation mechanism.
We show that it is viable to train a neural network on a source TS dataset and deploy it on a different target TS dataset without retraining.
arXiv Detail & Related papers (2020-02-07T16:39:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.