Related papers: GraphWeaver: Billion-Scale Cybersecurity Incident Correlation

GraphWeaver: Billion-Scale Cybersecurity Incident Correlation

URL: http://arxiv.org/abs/2406.01842v1
Date: Mon, 3 Jun 2024 23:28:05 GMT
Title: GraphWeaver: Billion-Scale Cybersecurity Incident Correlation
Authors: Scott Freitas, Amir Gharib,
Abstract summary: We introduce GraphWeaver, an industry-scale framework that shifts the traditional incident correlation process to a data-optimized, geo-distributed graph based approach. GraphWeaver is integrated into the Microsoft Defender XDR product and deployed worldwide, handling billions of correlations with a 99% accuracy rate. This integration has not only maintained high correlation accuracy but reduces traditional correlation storage requirements by 7.4x.
Score: 2.2572772235310934
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the dynamic landscape of large enterprise cybersecurity, accurately and efficiently correlating billions of security alerts into comprehensive incidents is a substantial challenge. Traditional correlation techniques often struggle with maintenance, scaling, and adapting to emerging threats and novel sources of telemetry. We introduce GraphWeaver, an industry-scale framework that shifts the traditional incident correlation process to a data-optimized, geo-distributed graph based approach. GraphWeaver introduces a suite of innovations tailored to handle the complexities of correlating billions of shared evidence alerts across hundreds of thousands of enterprises. Key among these innovations are a geo-distributed database and PySpark analytics engine for large-scale data processing, a minimum spanning tree algorithm to optimize correlation storage, integration of security domain knowledge and threat intelligence, and a human-in-the-loop feedback system to continuously refine key correlation processes and parameters. GraphWeaver is integrated into the Microsoft Defender XDR product and deployed worldwide, handling billions of correlations with a 99% accuracy rate, as confirmed by customer feedback and extensive investigations by security experts. This integration has not only maintained high correlation accuracy but reduces traditional correlation storage requirements by 7.4x. We provide an in-depth overview of the key design and operational features of GraphWeaver, setting a precedent as the first cybersecurity company to openly discuss these critical capabilities at this level of depth.

Related papers

Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation [60.81109086640437]
We propose a novel framework called Federated Retrieval-Augmented Generation (FedE4RAG) FedE4RAG facilitates collaborative training of client-side RAG retrieval models. We apply homomorphic encryption within federated learning to safeguard model parameters.
arXiv Detail & Related papers (2025-04-27T04:26:02Z)
Efficient and Privacy-Preserved Link Prediction via Condensed Graphs [49.898152180805454]
We introduce HyDROtextsuperscript+, a graph condensation method guided by algebraic Jaccard similarity. Our method achieves nearly 20* faster training and reduces storage requirements by 452*, compared to link prediction on the original networks.
arXiv Detail & Related papers (2025-03-15T14:54:04Z)
VulRG: Multi-Level Explainable Vulnerability Patch Ranking for Complex Systems Using Graphs [20.407534993667607]
This work introduces a graph-based framework for vulnerability patch prioritization. It integrates diverse data sources and metrics into a universally applicable model. refined risk metrics enable detailed assessments at the component, asset, and system levels.
arXiv Detail & Related papers (2025-02-16T14:21:52Z)
RelGNN: Composite Message Passing for Relational Deep Learning [56.48834369525997]
We introduce RelGNN, a novel GNN framework specifically designed to capture the unique characteristics of relational databases. At the core of our approach is the introduction of atomic routes, which are sequences of nodes forming high-order tripartite structures. RelGNN consistently achieves state-of-the-art accuracy with up to 25% improvement.
arXiv Detail & Related papers (2025-02-10T18:58:40Z)
Federated Granger Causality Learning for Interdependent Clients with State Space Representation [0.6499759302108926]
We develop a federated approach to learning Granger causality. We propose augmenting the client models with the Granger causality information learned by the server. We also study the convergence of the framework to a centralized oracle model.
arXiv Detail & Related papers (2025-01-23T18:04:21Z)
PriRoAgg: Achieving Robust Model Aggregation with Minimum Privacy Leakage for Federated Learning [49.916365792036636]
Federated learning (FL) has recently gained significant momentum due to its potential to leverage large-scale distributed user data. The transmitted model updates can potentially leak sensitive user information, and the lack of central control of the local training process leaves the global model susceptible to malicious manipulations on model updates. We develop a general framework PriRoAgg, utilizing Lagrange coded computing and distributed zero-knowledge proof, to execute a wide range of robust aggregation algorithms while satisfying aggregated privacy.
arXiv Detail & Related papers (2024-07-12T03:18:08Z)
Privacy-Preserving Intrusion Detection using Convolutional Neural Networks [0.25163931116642785]
We explore the use case of a model owner providing an analytic service on customer's private data. No information about the data shall be revealed to the analyst and no information about the model shall be leaked to the customer. We enhance an attack detection system based on Convolutional Neural Networks with privacy-preserving technology based on PriMIA framework.
arXiv Detail & Related papers (2024-04-15T09:56:36Z)
Attention-GAN for Anomaly Detection: A Cutting-Edge Approach to Cybersecurity Threat Management [0.0]
This paper proposes an innovative Attention-GAN framework for enhancing cybersecurity, focusing on anomaly detection. The proposed approach aims to generate diverse and realistic synthetic attack scenarios, thereby enriching the dataset and improving threat identification. Integrating attention mechanisms with Generative Adversarial Networks (GANs) is a key feature of the proposed method. The attention-GAN framework has emerged as a pioneering approach, setting a new benchmark for advanced cyber-defense strategies.
arXiv Detail & Related papers (2024-02-25T01:10:55Z)
It Is Time To Steer: A Scalable Framework for Analysis-driven Attack Graph Generation [50.06412862964449]
Attack Graph (AG) represents the best-suited solution to support cyber risk assessment for multi-step attacks on computer networks. Current solutions propose to address the generation problem from the algorithmic perspective and postulate the analysis only after the generation is complete. This paper rethinks the classic AG analysis through a novel workflow in which the analyst can query the system anytime.
arXiv Detail & Related papers (2023-12-27T10:44:58Z)
Fed-urlBERT: Client-side Lightweight Federated Transformers for URL Threat Analysis [6.552094912099549]
Federated URL pre-trained model designed to address both privacy concerns and the need for cross-domain collaboration in cybersecurity. Our appraoch achieves performance comparable to centralized model under both independently and identically distributed (IID) and two non-IID data scenarios.
arXiv Detail & Related papers (2023-12-06T17:31:16Z)
Graph Mining for Cybersecurity: A Survey [61.505995908021525]
The explosive growth of cyber attacks nowadays, such as malware, spam, and intrusions, caused severe consequences on society. Traditional Machine Learning (ML) based methods are extensively used in detecting cyber threats, but they hardly model the correlations between real-world cyber entities. With the proliferation of graph mining techniques, many researchers investigated these techniques for capturing correlations between cyber entities and achieving high performance.
arXiv Detail & Related papers (2023-04-02T08:43:03Z)
Privacy-preserving Graph Analytics: Secure Generation and Federated Learning [72.90158604032194]
We focus on the privacy-preserving analysis of graph data, which provides the crucial capacity to represent rich attributes and relationships. We discuss two directions, namely privacy-preserving graph generation and federated graph learning, which can jointly enable the collaboration among multiple parties each possessing private graph data.
arXiv Detail & Related papers (2022-06-30T18:26:57Z)
Efficient Logistic Regression with Local Differential Privacy [0.0]
Internet of Things devices are expanding rapidly and generating huge amount of data. There is an increasing need to explore data collected from these devices. Collaborative learning provides a strategic solution for the Internet of Things settings but also raises public concern over data privacy.
arXiv Detail & Related papers (2022-02-05T22:44:03Z)
Information Obfuscation of Graph Neural Networks [96.8421624921384]
We study the problem of protecting sensitive attributes by information obfuscation when learning with graph structured data. We propose a framework to locally filter out pre-determined sensitive attributes via adversarial training with the total variation and the Wasserstein distance.
arXiv Detail & Related papers (2020-09-28T17:55:04Z)
PicoDomain: A Compact High-Fidelity Cybersecurity Dataset [0.9281671380673305]
Current cybersecurity datasets either offer no ground truth or do so with anonymized data. Most existing datasets are large enough to make them unwieldy during prototype development. In this paper we have developed the PicoDomain dataset, a compact high-fidelity collection of Zeek logs from a realistic intrusion.
arXiv Detail & Related papers (2020-08-20T20:18:04Z)
Privacy-preserving Traffic Flow Prediction: A Federated Learning Approach [61.64006416975458]
We propose a privacy-preserving machine learning technique named Federated Learning-based Gated Recurrent Unit neural network algorithm (FedGRU) for traffic flow prediction. FedGRU differs from current centralized learning methods and updates universal learning models through a secure parameter aggregation mechanism. It is shown that FedGRU's prediction accuracy is 90.96% higher than the advanced deep learning models.
arXiv Detail & Related papers (2020-03-19T13:07:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.