Related papers: KGV: Integrating Large Language Models with Knowledge Graphs for Cyber Threat Intelligence Credibility Assessment

KGV: Integrating Large Language Models with Knowledge Graphs for Cyber Threat Intelligence Credibility Assessment

URL: http://arxiv.org/abs/2408.08088v1
Date: Thu, 15 Aug 2024 11:32:46 GMT
Title: KGV: Integrating Large Language Models with Knowledge Graphs for Cyber Threat Intelligence Credibility Assessment
Authors: Zongzong Wu, Fengxiao Tang, Ming Zhao, Yufeng Li,
Abstract summary: We propose a knowledge graph-based verifier for Cyber Threat Intelligence (CTI) quality assessment framework. Our approach introduces Large Language Models (LLMs) to automatically extract OSCTI key claims to be verified. To fill the gap in the research field, we created and made public the first dataset for threat intelligence assessment from heterogeneous sources.
Score: 38.312774244521
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cyber threat intelligence is a critical tool that many organizations and individuals use to protect themselves from sophisticated, organized, persistent, and weaponized cyber attacks. However, few studies have focused on the quality assessment of threat intelligence provided by intelligence platforms, and this work still requires manual analysis by cybersecurity experts. In this paper, we propose a knowledge graph-based verifier, a novel Cyber Threat Intelligence (CTI) quality assessment framework that combines knowledge graphs and Large Language Models (LLMs). Our approach introduces LLMs to automatically extract OSCTI key claims to be verified and utilizes a knowledge graph consisting of paragraphs for fact-checking. This method differs from the traditional way of constructing complex knowledge graphs with entities as nodes. By constructing knowledge graphs with paragraphs as nodes and semantic similarity as edges, it effectively enhances the semantic understanding ability of the model and simplifies labeling requirements. Additionally, to fill the gap in the research field, we created and made public the first dataset for threat intelligence assessment from heterogeneous sources. To the best of our knowledge, this work is the first to create a dataset on threat intelligence reliability verification, providing a reference for future research. Experimental results show that KGV (Knowledge Graph Verifier) significantly improves the performance of LLMs in intelligence quality assessment. Compared with traditional methods, we reduce a large amount of data annotation while the model still exhibits strong reasoning capabilities. Finally, our method can achieve XXX accuracy in network threat assessment.

Related papers

Towards Improving Long-Tail Entity Predictions in Temporal Knowledge Graphs through Global Similarity and Weighted Sampling [53.11315884128402]
Temporal Knowledge Graph (TKG) completion models traditionally assume access to the entire graph during training.<n>We present an incremental training framework specifically designed for TKGs, aiming to address entities that are either not observed during training or have sparse connections.<n>Our approach combines a model-agnostic enhancement layer with a weighted sampling strategy, that can be augmented to and improve any existing TKG completion method.
arXiv Detail & Related papers (2025-07-25T06:02:48Z)
SynthCTI: LLM-Driven Synthetic CTI Generation to enhance MITRE Technique Mapping [1.2534672170380357]
We present SynthCTI, a framework designed to generate high-quality synthetic CTI sentences for underrepresented MITRE ATT&CK techniques.<n>Our method uses a clustering-based strategy to extract semantic context from training data.<n>We evaluate SynthCTI on two publicly available CTI datasets, CTI-to-MITRE and TRAM, using LLMs with different capacity.
arXiv Detail & Related papers (2025-07-21T09:22:39Z)
LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification [7.608817324043705]
We propose LRCTI, a framework designed for multi-step Cyber Threat Intelligence credibility verification.<n>The framework first employs a text summarization module to distill complex intelligence reports into concise and actionable threat claims.<n>It then uses an adaptive multi-step evidence retrieval mechanism that iteratively identifies and refines supporting information from a CTI-specific corpus.<n>Experiments conducted on two benchmark datasets, CTI-200 and PolitiFact show that LRCTI improves F1-Macro and F1-Micro scores by over 5%, reaching 90.9% and 93.6%, respectively.
arXiv Detail & Related papers (2025-07-15T13:42:32Z)
Integrating Graph Theoretical Approaches in Cybersecurity Education CSCI-RTED [0.0]
Graph theory offers a powerful framework for modeling relationships within cyber ecosystems. This paper develops an enriched version of the widely recognized NSL-KDD dataset, incorporating graph-theoretical concepts to enhance its practical value.
arXiv Detail & Related papers (2025-04-23T19:08:30Z)
CTI-HAL: A Human-Annotated Dataset for Cyber Threat Intelligence Analysis [2.7862108332002546]
Cyber Threat Intelligence (CTI) sources are often unstructured and in natural language, making it difficult to automatically extract information.<n>Recent studies have explored the use of AI to perform automatic extraction from CTI data.<n>We introduce a novel dataset manually constructed from CTI reports and structured according to the MITRE ATT&CK framework.
arXiv Detail & Related papers (2025-04-08T09:47:15Z)
CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity [49.657358248788945]
Textual descriptions in cyber threat intelligence (CTI) reports are rich sources of knowledge about cyber threats. Current CTI extraction methods lack flexibility and generalizability, often resulting in inaccurate and incomplete knowledge extraction. We propose CTINexus, a novel framework leveraging optimized in-context learning (ICL) of large language models.
arXiv Detail & Related papers (2024-10-28T14:18:32Z)
Cyber Knowledge Completion Using Large Language Models [1.4883782513177093]
Integrating the Internet of Things (IoT) into Cyber-Physical Systems (CPSs) has expanded their cyber-attack surface. Assessing the risks of CPSs is increasingly difficult due to incomplete and outdated cybersecurity knowledge. Recent advancements in Large Language Models (LLMs) present a unique opportunity to enhance cyber-attack knowledge completion.
arXiv Detail & Related papers (2024-09-24T15:20:39Z)
Actionable Cyber Threat Intelligence using Knowledge Graphs and Large Language Models [0.8192907805418583]
Microsoft, Trend Micro, and CrowdStrike are using generative AI to facilitate CTI extraction. This paper addresses the challenge of automating the extraction of actionable CTI using advancements in Large Language Models (LLMs) and Knowledge Graphs (KGs) Our methodology evaluates techniques such as prompt engineering, the guidance framework, and fine-tuning to optimize information extraction and structuring. Experimental results demonstrate the effectiveness of our approach in extracting relevant information, with guidance and fine-tuning showing superior performance over prompt engineering.
arXiv Detail & Related papers (2024-06-30T13:02:03Z)
KG-FIT: Knowledge Graph Fine-Tuning Upon Open-World Knowledge [63.19837262782962]
Knowledge Graph Embedding (KGE) techniques are crucial in learning compact representations of entities and relations within a knowledge graph. This study introduces KG-FIT, which builds a semantically coherent hierarchical structure of entity clusters. Experiments on the benchmark datasets FB15K-237, YAGO3-10, and PrimeKG demonstrate the superiority of KG-FIT over state-of-the-art pre-trained language model-based methods.
arXiv Detail & Related papers (2024-05-26T03:04:26Z)
Unveiling Hidden Links Between Unseen Security Entities [3.7138962865789353]
VulnScopper is an innovative approach that utilizes multi-modal representation learning, combining Knowledge Graphs (KG) and Natural Processing (NLP) We evaluate VulnScopper on two major security datasets, the National Vulnerability Database (NVD) and the Red Hat CVE database. Our results show that VulnScopper outperforms existing methods, achieving up to 78% Hits@10 accuracy in linking CVEs to Common Vulnerabilities and Exposures (CWEs), and Common Platform Languageions (CPEs)
arXiv Detail & Related papers (2024-03-04T13:14:39Z)
Constructing a Knowledge Graph from Textual Descriptions of Software Vulnerabilities in the National Vulnerability Database [3.0724051098062097]
We present a new method for constructing a vulnerability knowledge graph from information in the National Database (NVD) Our approach combines named entity recognition (NER), relation extraction (RE), and entity prediction using a combination of neural models, rules, and knowledge graph embeddings. We demonstrate how our method helps to fix missing entities in knowledge graphs used for cybersecurity and evaluate the performance.
arXiv Detail & Related papers (2023-04-30T04:23:40Z)
Recognizing and Extracting Cybersecurtity-relevant Entities from Text [1.7499351967216343]
Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks. CTI is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG)
arXiv Detail & Related papers (2022-08-02T18:44:06Z)
Explainable Sparse Knowledge Graph Completion via High-order Graph Reasoning Network [111.67744771462873]
This paper proposes a novel explainable model for sparse Knowledge Graphs (KGs) It combines high-order reasoning into a graph convolutional network, namely HoGRN. It can not only improve the generalization ability to mitigate the information insufficiency issue but also provide interpretability.
arXiv Detail & Related papers (2022-07-14T10:16:56Z)
Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora. Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z)
InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks. Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks. We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z)
Information Obfuscation of Graph Neural Networks [96.8421624921384]
We study the problem of protecting sensitive attributes by information obfuscation when learning with graph structured data. We propose a framework to locally filter out pre-determined sensitive attributes via adversarial training with the total variation and the Wasserstein distance.
arXiv Detail & Related papers (2020-09-28T17:55:04Z)
Graph Backdoor [53.70971502299977]
We present GTA, the first backdoor attack on graph neural networks (GNNs) GTA departs in significant ways: it defines triggers as specific subgraphs, including both topological structures and descriptive features. It can be instantiated for both transductive (e.g., node classification) and inductive (e.g., graph classification) tasks.
arXiv Detail & Related papers (2020-06-21T19:45:30Z)
Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning [73.0598186896953]
We present two self-supervised tasks learning over raw text with the guidance from knowledge graphs. Building upon entity-level masked language models, our first contribution is an entity masking scheme. In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training.
arXiv Detail & Related papers (2020-04-29T14:22:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.