Unveiling Hidden Links Between Unseen Security Entities
- URL: http://arxiv.org/abs/2403.02014v1
- Date: Mon, 4 Mar 2024 13:14:39 GMT
- Title: Unveiling Hidden Links Between Unseen Security Entities
- Authors: Daniel Alfasi, Tal Shapira, Anat Bremler Barr
- Abstract summary: VulnScopper is an innovative approach that utilizes multi-modal representation learning, combining Knowledge Graphs (KG) and Natural Processing (NLP)
We evaluate VulnScopper on two major security datasets, the National Vulnerability Database (NVD) and the Red Hat CVE database.
Our results show that VulnScopper outperforms existing methods, achieving up to 78% Hits@10 accuracy in linking CVEs to Common Vulnerabilities and Exposures (CWEs), and Common Platform Languageions (CPEs)
- Score: 3.7138962865789353
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The proliferation of software vulnerabilities poses a significant challenge
for security databases and analysts tasked with their timely identification,
classification, and remediation. With the National Vulnerability Database (NVD)
reporting an ever-increasing number of vulnerabilities, the traditional manual
analysis becomes untenably time-consuming and prone to errors. This paper
introduces VulnScopper, an innovative approach that utilizes multi-modal
representation learning, combining Knowledge Graphs (KG) and Natural Language
Processing (NLP), to automate and enhance the analysis of software
vulnerabilities. Leveraging ULTRA, a knowledge graph foundation model, combined
with a Large Language Model (LLM), VulnScopper effectively handles unseen
entities, overcoming the limitations of previous KG approaches. We evaluate
VulnScopper on two major security datasets, the NVD and the Red Hat CVE
database. Our method significantly improves the link prediction accuracy
between Common Vulnerabilities and Exposures (CVEs), Common Weakness
Enumeration (CWEs), and Common Platform Enumerations (CPEs). Our results show
that VulnScopper outperforms existing methods, achieving up to 78% Hits@10
accuracy in linking CVEs to CPEs and CWEs and presenting an 11.7% improvement
over large language models in predicting CWE labels based on the Red Hat
database. Based on the NVD, only 6.37% of the linked CPEs are being published
during the first 30 days; many of them are related to critical and high-risk
vulnerabilities which, according to multiple compliance frameworks (such as
CISA and PCI), should be remediated within 15-30 days. Our model can uncover
new products linked to vulnerabilities, reducing remediation time and improving
vulnerability management. We analyzed several CVEs from 2023 to showcase this
ability.
Related papers
- CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity [49.657358248788945]
Textual descriptions in cyber threat intelligence (CTI) reports are rich sources of knowledge about cyber threats.
Current CTI extraction methods lack flexibility and generalizability, often resulting in inaccurate and incomplete knowledge extraction.
We propose CTINexus, a novel framework leveraging optimized in-context learning (ICL) of large language models.
arXiv Detail & Related papers (2024-10-28T14:18:32Z) - Enhancing Pre-Trained Language Models for Vulnerability Detection via Semantic-Preserving Data Augmentation [4.374800396968465]
We propose a data augmentation technique aimed at enhancing the performance of pre-trained language models for vulnerability detection.
By incorporating our augmented dataset in fine-tuning a series of representative code pre-trained models, up to 10.1% increase in accuracy and 23.6% increase in F1 can be achieved.
arXiv Detail & Related papers (2024-09-30T21:44:05Z) - SAFE: Advancing Large Language Models in Leveraging Semantic and Syntactic Relationships for Software Vulnerability Detection [23.7268575752712]
Software vulnerabilities (SVs) have emerged as a prevalent and critical concern for safety-critical security systems.
We propose a novel framework that enhances the capability of large language models to learn and utilize semantic and syntactic relationships from source code data for SVD.
arXiv Detail & Related papers (2024-09-02T00:49:02Z) - PriRoAgg: Achieving Robust Model Aggregation with Minimum Privacy Leakage for Federated Learning [49.916365792036636]
Federated learning (FL) has recently gained significant momentum due to its potential to leverage large-scale distributed user data.
The transmitted model updates can potentially leak sensitive user information, and the lack of central control of the local training process leaves the global model susceptible to malicious manipulations on model updates.
We develop a general framework PriRoAgg, utilizing Lagrange coded computing and distributed zero-knowledge proof, to execute a wide range of robust aggregation algorithms while satisfying aggregated privacy.
arXiv Detail & Related papers (2024-07-12T03:18:08Z) - CPE-Identifier: Automated CPE identification and CVE summaries annotation with Deep Learning and NLP [0.28281736775010774]
We propose the CPE-Identifier system, an automated CPE annotating and extracting system, from the CVE summaries.
The system can be used as a tool to identify CPE entities from new CVE text inputs.
We also apply Natural Language Processing (NLP) Named Entity Recognition (NER) to identify new technical jargons in the text.
arXiv Detail & Related papers (2024-05-22T12:05:17Z) - FaultGuard: A Generative Approach to Resilient Fault Prediction in Smart Electrical Grids [53.2306792009435]
FaultGuard is the first framework for fault type and zone classification resilient to adversarial attacks.
We propose a low-complexity fault prediction model and an online adversarial training technique to enhance robustness.
Our model outclasses the state-of-the-art for resilient fault prediction benchmarking, with an accuracy of up to 0.958.
arXiv Detail & Related papers (2024-03-26T08:51:23Z) - Automated CVE Analysis for Threat Prioritization and Impact Prediction [4.540236408836132]
We introduce our novel predictive model and tool (called CVEDrill) which revolutionizes CVE analysis and threat prioritization.
CVEDrill accurately estimates the Common Vulnerability Scoring System (CVSS) vector for precise threat mitigation and priority ranking.
It seamlessly automates the classification of CVEs into the appropriate Common Weaknession (CWE) hierarchy classes.
arXiv Detail & Related papers (2023-09-06T14:34:03Z) - G$^2$uardFL: Safeguarding Federated Learning Against Backdoor Attacks
through Attributed Client Graph Clustering [116.4277292854053]
Federated Learning (FL) offers collaborative model training without data sharing.
FL is vulnerable to backdoor attacks, where poisoned model weights lead to compromised system integrity.
We present G$2$uardFL, a protective framework that reinterprets the identification of malicious clients as an attributed graph clustering problem.
arXiv Detail & Related papers (2023-06-08T07:15:04Z) - Confidence Attention and Generalization Enhanced Distillation for
Continuous Video Domain Adaptation [62.458968086881555]
Continuous Video Domain Adaptation (CVDA) is a scenario where a source model is required to adapt to a series of individually available changing target domains.
We propose a Confidence-Attentive network with geneRalization enhanced self-knowledge disTillation (CART) to address the challenge in CVDA.
arXiv Detail & Related papers (2023-03-18T16:40:10Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z) - V2W-BERT: A Framework for Effective Hierarchical Multiclass
Classification of Software Vulnerabilities [7.906207218788341]
We present a novel Transformer-based learning framework (V2W-BERT) in this paper.
By using ideas from natural language processing, link prediction and transfer learning, our method outperforms previous approaches.
We achieve up to 97% prediction accuracy for randomly partitioned data and up to 94% prediction accuracy in temporally partitioned data.
arXiv Detail & Related papers (2021-02-23T05:16:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.