TRACE: Timely Retrieval and Alignment for Cybersecurity Knowledge Graph Construction and Expansion
- URL: http://arxiv.org/abs/2602.11211v1
- Date: Wed, 11 Feb 2026 06:54:21 GMT
- Title: TRACE: Timely Retrieval and Alignment for Cybersecurity Knowledge Graph Construction and Expansion
- Authors: Zijing Xu, Ziwei Ning, Tiancheng Hu, Jianwei Zhuge, Yangyang Wang, Jiahao Cao, Mingwei Xu,
- Abstract summary: TRACE is a framework designed to integrate structured and unstructured cybersecurity data sources.<n>TRACE integrates knowledge from 24 structured databases and 3 categories of unstructured data, including APT reports, papers, and repair notices.
- Score: 22.605362022279497
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid evolution of cyber threats has highlighted significant gaps in security knowledge integration. Cybersecurity Knowledge Graphs (CKGs) relying on structured data inherently exhibit hysteresis, as the timely incorporation of rapidly evolving unstructured data remains limited, potentially leading to the omission of critical insights for risk analysis. To address these limitations, we introduce TRACE, a framework designed to integrate structured and unstructured cybersecurity data sources. TRACE integrates knowledge from 24 structured databases and 3 categories of unstructured data, including APT reports, papers, and repair notices. Leveraging Large Language Models (LLMs), TRACE facilitates efficient entity extraction and alignment, enabling continuous updates to the CKG. Evaluations demonstrate that TRACE achieves a 1.8x increase in node coverage compared to existing CKGs. TRACE attains the precision of 86.08%, the recall of 76.92%, and the F1 score of 81.24% in entity extraction, surpassing the best-known LLM-based baselines by 7.8%. Furthermore, our entity alignment methods effectively harmonize entities with existing knowledge structures, enhancing the integrity and utility of the CKG. With TRACE, threat hunters and attack analysts gain real-time, holistic insights into vulnerabilities, attack methods, and defense technologies.
Related papers
- Autonomous Chain-of-Thought Distillation for Graph-Based Fraud Detection [73.9189065770752]
Graph-based fraud detection on text-attributed graphs (TAGs) requires jointly modeling rich textual semantics and relational dependencies.<n>We propose FraudCoT, a unified framework that advances TAG-based fraud detection through autonomous, graph-aware chain-of-thought (CoT) reasoning and scalable LLM-GNN co-training.
arXiv Detail & Related papers (2026-01-30T13:12:12Z) - Task-Agnostic Learnable Weighted-Knowledge Base Scheme for Robust Semantic Communications [52.36313868773825]
We propose a task-agnostic learnable weighted-knowledge base semantic communication (TALSC) framework for robust image transmission.<n>The framework incorporates a sample confidence module (SCM) as meta-learner and the semantic coding networks as learners.<n> Simulations demonstrate that the TALSC framework effectively mitigates the effects of flipping noise and class imbalance in task-agnostic image semantic communication.
arXiv Detail & Related papers (2025-09-15T07:10:21Z) - Towards Improving Long-Tail Entity Predictions in Temporal Knowledge Graphs through Global Similarity and Weighted Sampling [53.11315884128402]
Temporal Knowledge Graph (TKG) completion models traditionally assume access to the entire graph during training.<n>We present an incremental training framework specifically designed for TKGs, aiming to address entities that are either not observed during training or have sparse connections.<n>Our approach combines a model-agnostic enhancement layer with a weighted sampling strategy, that can be augmented to and improve any existing TKG completion method.
arXiv Detail & Related papers (2025-07-25T06:02:48Z) - FORGE: An LLM-driven Framework for Large-Scale Smart Contract Vulnerability Dataset Construction [34.20628333535654]
FORGE is the first automated approach for constructing smart contract vulnerability datasets.<n>We generate a dataset comprising 81,390 solidity files and 27,497 vulnerability findings across 296 CWE categories.<n>Results reveal the significant limitations in current detection capabilities.
arXiv Detail & Related papers (2025-06-23T16:03:16Z) - Streamlining Security Vulnerability Triage with Large Language Models [0.786186571320448]
We present CASEY, a novel approach that automates the identification of Common Weaknessions (CWEs) of security bugs and assesses their severity.<n>Casey achieved a CWE identification accuracy of 68%, a severity identification accuracy of 73.6%, and a combined accuracy of 51.2%.
arXiv Detail & Related papers (2025-01-31T06:02:24Z) - CTINexus: Automatic Cyber Threat Intelligence Knowledge Graph Construction Using Large Language Models [49.657358248788945]
Textual descriptions in cyber threat intelligence (CTI) reports are rich sources of knowledge about cyber threats.<n>Current CTI knowledge extraction methods lack flexibility and generalizability.<n>We propose CTINexus, a novel framework for data-efficient CTI knowledge extraction and high-quality cybersecurity knowledge graph (CSKG) construction.
arXiv Detail & Related papers (2024-10-28T14:18:32Z) - SeCodePLT: A Unified Platform for Evaluating the Security of Code GenAI [58.29510889419971]
Existing benchmarks for evaluating the security risks and capabilities of code-generating large language models (LLMs) face several key limitations.<n>We introduce a general and scalable benchmark construction framework that begins with manually validated, high-quality seed examples and expands them via targeted mutations.<n>Applying this framework to Python, C/C++, and Java, we build SeCodePLT, a dataset of more than 5.9k samples spanning 44 CWE-based risk categories and three security capabilities.
arXiv Detail & Related papers (2024-10-14T21:17:22Z) - KGV: Integrating Large Language Models with Knowledge Graphs for Cyber Threat Intelligence Credibility Assessment [38.312774244521]
Cyber threat intelligence (CTI) is a crucial tool to prevent sophisticated, organized, and weaponized cyber attacks.<n>We propose Knowledge Graph-based Verifier (KGV), the first framework integrating large language models (LLMs) with simple structured knowledge graphs (KGs) for automated CTI credibility assessment.<n> Experimental results demonstrate that our KGV outperforms state-of-the-art fact reasoning methods on the CTI-200 dataset, achieving a 5.7% improvement in F1.
arXiv Detail & Related papers (2024-08-15T11:32:46Z) - KG-FIT: Knowledge Graph Fine-Tuning Upon Open-World Knowledge [63.19837262782962]
Knowledge Graph Embedding (KGE) techniques are crucial in learning compact representations of entities and relations within a knowledge graph.
This study introduces KG-FIT, which builds a semantically coherent hierarchical structure of entity clusters.
Experiments on the benchmark datasets FB15K-237, YAGO3-10, and PrimeKG demonstrate the superiority of KG-FIT over state-of-the-art pre-trained language model-based methods.
arXiv Detail & Related papers (2024-05-26T03:04:26Z) - Unveiling Hidden Links Between Unseen Security Entities [3.7138962865789353]
VulnScopper is an innovative approach that utilizes multi-modal representation learning, combining Knowledge Graphs (KG) and Natural Processing (NLP)
We evaluate VulnScopper on two major security datasets, the National Vulnerability Database (NVD) and the Red Hat CVE database.
Our results show that VulnScopper outperforms existing methods, achieving up to 78% Hits@10 accuracy in linking CVEs to Common Vulnerabilities and Exposures (CWEs), and Common Platform Languageions (CPEs)
arXiv Detail & Related papers (2024-03-04T13:14:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.