ProGQL: A Provenance Graph Query System for Cyber Attack Investigation
- URL: http://arxiv.org/abs/2510.22400v2
- Date: Wed, 29 Oct 2025 18:56:07 GMT
- Title: ProGQL: A Provenance Graph Query System for Cyber Attack Investigation
- Authors: Fei Shao, Jia Zou, Zhichao Cao, Xusheng Xiao,
- Abstract summary: Provenance analysis (PA) has emerged as an important solution for cyber attack investigation.<n>Existing PA techniques are inflexible and non-extensible, making it difficult to incorporate analyst expertise.<n>We propose the ProGQL framework, which provides a domain-specific graph search language with a well-engineered query engine.
- Score: 6.954627558521413
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Provenance analysis (PA) has recently emerged as an important solution for cyber attack investigation. PA leverages system monitoring to monitor system activities as a series of system audit events and organizes these events as a provenance graph to show the dependencies among system activities, which can reveal steps of cyber attacks. Despite their potential, existing PA techniques face two critical challenges: (1) they are inflexible and non-extensible, making it difficult to incorporate analyst expertise, and (2) they are memory inefficient, often requiring>100GB of RAM to hold entire event streams, which fundamentally limits scalability and deployment in real-world environments. To address these limitations, we propose the ProGQL framework, which provides a domain-specific graph search language with a well-engineered query engine, allowing PA over system audit events and expert knowledge to be jointly expressed as a graph search query and thereby facilitating the investigation of complex cyberattacks. In particular, to support dependency searches from a starting edge required in PA, ProGQL introduces new language constructs for constrained graph traversal, edge weight computation, value propagation along weighted edges, and graph merging to integrate multiple searches. Moreover, the ProGQL query engine is optimized for efficient incremental graph search across heterogeneous database backends, eliminating the need for full in-memory materialization and reducing memory overhead. Our evaluations on real attacks demonstrate the effectiveness of the ProGQL language in expressing a diverse set of complex attacks compared with the state-of-the-art graph query language Cypher, and the comparison with the SOTA PA technique DEPIMPACT further demonstrates the significant improvement of the scalability brought by our ProGQL framework's design.
Related papers
- HELP: HyperNode Expansion and Logical Path-Guided Evidence Localization for Accurate and Efficient GraphRAG [53.30561659838455]
Large Language Models (LLMs) often struggle with inherent knowledge boundaries and hallucinations.<n>Retrieval-Augmented Generation (RAG) frequently overlooks structural interdependencies essential for multi-hop reasoning.<n>Help achieves competitive performance across multiple simple and multi-hop QA benchmarks and up to a 28.8$times$ speedup over leading Graph-based RAG baselines.
arXiv Detail & Related papers (2026-02-24T14:05:29Z) - Use Graph When It Needs: Efficiently and Adaptively Integrating Retrieval-Augmented Generation with Graphs [12.14017207383674]
Large language models (LLMs) often struggle with knowledge-intensive tasks due to hallucinations and outdated parametric knowledge.<n>Retrieval-Augmented Generation (RAG) addresses this by integrating external corpora, but its effectiveness is limited by fragmented information in unstructured domain documents.<n>GraphRAG emerged to enhance contextual reasoning through structured knowledge graphs, yet paradoxically underperforms vanilla RAG in real-world scenarios.<n>We propose EA-GraphRAG that dynamically integrates RAG and GraphRAG paradigms through syntax-aware complexity analysis.
arXiv Detail & Related papers (2026-02-03T14:26:28Z) - Query-Efficient Agentic Graph Extraction Attacks on GraphRAG Systems [29.89127594311822]
Graph-based retrieval-augmented generation (GraphRAG) systems construct knowledge graphs over document collections to support multi-hop reasoning.<n>We study a budget-constrained black-box setting where an adversary adaptively queries the system to steal its latent entity-relation graph.<n>We propose AGEA, a framework that leverages a novelty-guided exploration-exploitation strategy, external graph memory modules, and a two-stage graph extraction pipeline.
arXiv Detail & Related papers (2026-01-21T05:20:54Z) - A Navigational Approach for Comprehensive RAG via Traversal over Proposition Graphs [23.840376380790783]
ToPG models its knowledge base as a heterogeneous graph of propositions, entities, and passages.<n>ToPG demonstrates strong performance across both accuracy- and quality-based metrics.
arXiv Detail & Related papers (2026-01-08T11:50:40Z) - Deterministic Legal Retrieval: An Action API for Querying the SAT-Graph RAG [0.0]
This paper introduces the SAT-Graph API, a formal query execution layer centered on canonical actions.<n>We show how planner-guided agents can decompose complex queries into Directed Acyclic Graphs.<n>This architecture transforms retrieval from an opaque black box to a transparent, auditable process.
arXiv Detail & Related papers (2025-10-07T15:04:23Z) - Enrich-on-Graph: Query-Graph Alignment for Complex Reasoning with LLM Enriching [61.824094419641575]
Large Language Models (LLMs) struggle with hallucinations and factual errors in knowledge-intensive scenarios like knowledge graph question answering (KGQA)<n>We attribute this to the semantic gap between structured knowledge graphs (KGs) and unstructured queries, caused by inherent differences in their focuses and structures.<n>Existing methods usually employ resource-intensive, non-scalable reasoning on vanilla KGs, but overlook this gap.<n>We propose a flexible framework, Enrich-on-Graph (EoG), which leverages LLMs' prior knowledge to enrich KGs, bridge the semantic gap between graphs and queries.
arXiv Detail & Related papers (2025-09-25T06:48:52Z) - GRIL: Knowledge Graph Retrieval-Integrated Learning with Large Language Models [59.72897499248909]
We propose a novel graph retriever trained end-to-end with Large Language Models (LLMs)<n>Within the extracted subgraph, structural knowledge and semantic features are encoded via soft tokens and the verbalized graph, respectively, which are infused into the LLM together.<n>Our approach consistently achieves state-of-the-art performance, validating the strength of joint graph-LLM optimization for complex reasoning tasks.
arXiv Detail & Related papers (2025-09-20T02:38:00Z) - Enhancing GraphQL Security by Detecting Malicious Queries Using Large Language Models, Sentence Transformers, and Convolutional Neural Networks [0.0]
API's flexibility, while beneficial for efficient data fetching, introduces security vulnerabilities that traditional API security mechanisms often fail to address.<n>Malicious queries can exploit the language's dynamic nature, leading to denial-of-service attacks, data exfiltration through injection, and other exploits.<n>This paper presents a novel, AI-driven approach for real-time detection of malicious queries.
arXiv Detail & Related papers (2025-08-14T07:35:11Z) - Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering [75.12322966980003]
Large Language Models (LLMs) have shown strong inductive reasoning ability across various domains.<n>Most existing RAG pipelines rely on unstructured text, limiting interpretability and structured reasoning.<n>Recent studies have explored integrating knowledge graphs with LLMs for knowledge graph question answering.<n>We propose RAPL, a novel framework for efficient and effective graph retrieval in KGQA.
arXiv Detail & Related papers (2025-06-11T12:03:52Z) - Divide by Question, Conquer by Agent: SPLIT-RAG with Question-Driven Graph Partitioning [62.640169289390535]
SPLIT-RAG is a multi-agent RAG framework that addresses the limitations with question-driven semantic graph partitioning and collaborative subgraph retrieval.<n>The innovative framework first create Semantic Partitioning of Linked Information, then use the Type-Specialized knowledge base to achieve Multi-Agent RAG.<n>The attribute-aware graph segmentation manages to divide knowledge graphs into semantically coherent subgraphs, ensuring subgraphs align with different query types.<n>A hierarchical merging module resolves inconsistencies across subgraph-derived answers through logical verifications.
arXiv Detail & Related papers (2025-05-20T06:44:34Z) - DO-RAG: A Domain-Specific QA Framework Using Knowledge Graph-Enhanced Retrieval-Augmented Generation [4.113142669523488]
Domain-specific QA systems require generative fluency but high factual accuracy grounded in structured expert knowledge.<n>We propose DO-RAG, a scalable and customizable hybrid QA framework that integrates multi-level knowledge graph construction with semantic vector retrieval.
arXiv Detail & Related papers (2025-05-17T06:40:17Z) - RGL: A Graph-Centric, Modular Framework for Efficient Retrieval-Augmented Generation on Graphs [58.10503898336799]
We introduce the RAG-on-Graphs Library (RGL), a modular framework that seamlessly integrates the complete RAG pipeline.<n>RGL addresses key challenges by supporting a variety of graph formats and integrating optimized implementations for essential components.<n>Our evaluations demonstrate that RGL not only accelerates the prototyping process but also enhances the performance and applicability of graph-based RAG systems.
arXiv Detail & Related papers (2025-03-25T03:21:48Z) - Scalable Defect Detection via Traversal on Code Graph [10.860910384163892]
We introduce QVoG, a graph-based static analysis platform for detecting defects and vulnerabilities.
It employs a compressed CPG representation to maintain a reasonable graph size, thereby enhancing the overall query efficiency.
For projects consisting of 1,000,000+ lines of code, QVoG can complete analysis in approximately 15 minutes, as opposed to 19 minutes with CodeQL.
arXiv Detail & Related papers (2024-06-12T11:24:52Z) - It Is Time To Steer: A Scalable Framework for Analysis-driven Attack Graph Generation [50.06412862964449]
Attack Graph (AG) represents the best-suited solution to support cyber risk assessment for multi-step attacks on computer networks.
Current solutions propose to address the generation problem from the algorithmic perspective and postulate the analysis only after the generation is complete.
This paper rethinks the classic AG analysis through a novel workflow in which the analyst can query the system anytime.
arXiv Detail & Related papers (2023-12-27T10:44:58Z) - Neural Graph Reasoning: Complex Logical Query Answering Meets Graph
Databases [63.96793270418793]
Complex logical query answering (CLQA) is a recently emerged task of graph machine learning.
We introduce the concept of Neural Graph Database (NGDBs)
NGDB consists of a Neural Graph Storage and a Neural Graph Engine.
arXiv Detail & Related papers (2023-03-26T04:03:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.