APT-CGLP: Advanced Persistent Threat Hunting via Contrastive Graph-Language Pre-Training
- URL: http://arxiv.org/abs/2511.20290v1
- Date: Tue, 25 Nov 2025 13:20:12 GMT
- Title: APT-CGLP: Advanced Persistent Threat Hunting via Contrastive Graph-Language Pre-Training
- Authors: Xuebo Qiu, Mingqi Lv, Yimei Zhang, Tieming Chen, Tiantian Zhu, Qijie Song, Shouling Ji,
- Abstract summary: Provenance-based threat hunting identifies Advanced Persistent Threats (APTs) on endpoints by correlating attack patterns described in Cyber Threat Intelligence (CTI) with provenance graphs derived from system audit logs.<n>A fundamental challenge in this paradigm lies in the modality gap--the structural and semantic disconnect between provenance graphs and CTI reports.<n>We present APT-CGLP, a novel cross-modal APT hunting system via Contrastive Graph-Language Pre-training.
- Score: 33.84587345029278
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Provenance-based threat hunting identifies Advanced Persistent Threats (APTs) on endpoints by correlating attack patterns described in Cyber Threat Intelligence (CTI) with provenance graphs derived from system audit logs. A fundamental challenge in this paradigm lies in the modality gap--the structural and semantic disconnect between provenance graphs and CTI reports. Prior work addresses this by framing threat hunting as a graph matching task: 1) extracting attack graphs from CTI reports, and 2) aligning them with provenance graphs. However, this pipeline incurs severe \textit{information loss} during graph extraction and demands intensive manual curation, undermining scalability and effectiveness. In this paper, we present APT-CGLP, a novel cross-modal APT hunting system via Contrastive Graph-Language Pre-training, facilitating end-to-end semantic matching between provenance graphs and CTI reports without human intervention. First, empowered by the Large Language Model (LLM), APT-CGLP mitigates data scarcity by synthesizing high-fidelity provenance graph-CTI report pairs, while simultaneously distilling actionable insights from noisy web-sourced CTIs to improve their operational utility. Second, APT-CGLP incorporates a tailored multi-objective training algorithm that synergizes contrastive learning with inter-modal masked modeling, promoting cross-modal attack semantic alignment at both coarse- and fine-grained levels. Extensive experiments on four real-world APT datasets demonstrate that APT-CGLP consistently outperforms state-of-the-art threat hunting baselines in terms of accuracy and efficiency.
Related papers
- Autonomous Chain-of-Thought Distillation for Graph-Based Fraud Detection [73.9189065770752]
Graph-based fraud detection on text-attributed graphs (TAGs) requires jointly modeling rich textual semantics and relational dependencies.<n>We propose FraudCoT, a unified framework that advances TAG-based fraud detection through autonomous, graph-aware chain-of-thought (CoT) reasoning and scalable LLM-GNN co-training.
arXiv Detail & Related papers (2026-01-30T13:12:12Z) - TPPR: APT Tactic / Technique Pattern Guided Attack Path Reasoning for Attack Investigation [0.0]
We propose TPPR, a novel framework that first extracts anomaly subgraphs through abnormal node detection, TTP-annotation and graph pruning.<n> TPPR's capability to achieve 99.9% graph simplification (700,000 to 20 edges) while preserving 91% of critical attack nodes, outperforming state-of-the-art solutions (SPARSE, DepImpact) by 63.1% and 67.9% in reconstruction precision while maintaining attack scenario integrity.
arXiv Detail & Related papers (2025-10-25T07:13:07Z) - Towards Effective Identification of Attack Techniques in Cyber Threat Intelligence Reports using Large Language Models [5.304267859042463]
This work evaluates the performance of Cyber Threat Intelligence (CTI) extraction methods in identifying attack techniques from threat reports available on the web.<n>We analyse four configurations utilising state-of-the-art tools, including the Threat Report ATT&CK Mapper (TRAM) and open-source Large Language Models (LLMs) such as Llama2.<n>Our findings reveal significant challenges, including class imbalance, overfitting, and domain-specific complexity, which impede accurate technique extraction.
arXiv Detail & Related papers (2025-05-06T03:43:12Z) - CONTINUUM: Detecting APT Attacks through Spatial-Temporal Graph Neural Networks [0.9553673944187253]
Advanced Persistent Threats (APTs) represent a significant challenge in cybersecurity.<n>Traditional Intrusion Detection Systems (IDS) often fall short in detecting these multi-stage attacks.
arXiv Detail & Related papers (2025-01-06T12:43:59Z) - CTINexus: Automatic Cyber Threat Intelligence Knowledge Graph Construction Using Large Language Models [49.657358248788945]
Textual descriptions in cyber threat intelligence (CTI) reports are rich sources of knowledge about cyber threats.<n>Current CTI knowledge extraction methods lack flexibility and generalizability.<n>We propose CTINexus, a novel framework for data-efficient CTI knowledge extraction and high-quality cybersecurity knowledge graph (CSKG) construction.
arXiv Detail & Related papers (2024-10-28T14:18:32Z) - Graph Transductive Defense: a Two-Stage Defense for Graph Membership Inference Attacks [50.19590901147213]
Graph neural networks (GNNs) have become instrumental in diverse real-world applications, offering powerful graph learning capabilities.
GNNs are vulnerable to adversarial attacks, including membership inference attacks (MIA)
This paper proposes an effective two-stage defense, Graph Transductive Defense (GTD), tailored to graph transductive learning characteristics.
arXiv Detail & Related papers (2024-06-12T06:36:37Z) - APT-MMF: An advanced persistent threat actor attribution method based on
multimodal and multilevel feature fusion [10.562355854634566]
Threat actor attribution is a crucial defense strategy for combating advanced persistent threats (APTs)
Here, we propose an APT actor attribution method based on multimodal and multilevel feature fusion (APT-MMF)
We show that our method not only outperforms the existing methods but also demonstrates its good interpretability for attribution analysis tasks.
arXiv Detail & Related papers (2024-02-20T06:19:55Z) - NODLINK: An Online System for Fine-Grained APT Attack Detection and Investigation [15.803901489811318]
NodLink is the first online detection system that maintains high detection accuracy without sacrificing detection granularity.
We propose a novel design of in-memory cache, an efficient attack screening method, and a new approximation algorithm that is more efficient than the conventional one in APT attack detection.
arXiv Detail & Related papers (2023-11-04T05:36:59Z) - CVE-driven Attack Technique Prediction with Semantic Information
Extraction and a Domain-specific Language Model [2.1756081703276]
The paper introduces the TTPpredictor tool, which uses innovative techniques to analyze CVE descriptions and infer plausible TTP attacks resulting from CVE exploitation.
TTPpredictor overcomes challenges posed by limited labeled data and semantic disparities between CVE and TTP descriptions.
The paper presents an empirical assessment, demonstrating TTPpredictor's effectiveness with accuracy rates of approximately 98% and F1-scores ranging from 95% to 98% in precise CVE classification to ATT&CK techniques.
arXiv Detail & Related papers (2023-09-06T06:53:45Z) - Everything Perturbed All at Once: Enabling Differentiable Graph Attacks [61.61327182050706]
Graph neural networks (GNNs) have been shown to be vulnerable to adversarial attacks.
We propose a novel attack method called Differentiable Graph Attack (DGA) to efficiently generate effective attacks.
Compared to the state-of-the-art, DGA achieves nearly equivalent attack performance with 6 times less training time and 11 times smaller GPU memory footprint.
arXiv Detail & Related papers (2023-08-29T20:14:42Z) - Reinforcement Learning-based Black-Box Evasion Attacks to Link
Prediction in Dynamic Graphs [87.5882042724041]
Link prediction in dynamic graphs (LPDG) is an important research problem that has diverse applications.
We study the vulnerability of LPDG methods and propose the first practical black-box evasion attack.
arXiv Detail & Related papers (2020-09-01T01:04:49Z) - Graph Backdoor [53.70971502299977]
We present GTA, the first backdoor attack on graph neural networks (GNNs)
GTA departs in significant ways: it defines triggers as specific subgraphs, including both topological structures and descriptive features.
It can be instantiated for both transductive (e.g., node classification) and inductive (e.g., graph classification) tasks.
arXiv Detail & Related papers (2020-06-21T19:45:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.