Related papers: Unveiling Hidden Links Between Unseen Security Entities

Unveiling Hidden Links Between Unseen Security Entities

URL: http://arxiv.org/abs/2403.02014v1
Date: Mon, 4 Mar 2024 13:14:39 GMT
Title: Unveiling Hidden Links Between Unseen Security Entities
Authors: Daniel Alfasi, Tal Shapira, Anat Bremler Barr
Abstract summary: VulnScopper is an innovative approach that utilizes multi-modal representation learning, combining Knowledge Graphs (KG) and Natural Processing (NLP) We evaluate VulnScopper on two major security datasets, the National Vulnerability Database (NVD) and the Red Hat CVE database. Our results show that VulnScopper outperforms existing methods, achieving up to 78% Hits@10 accuracy in linking CVEs to Common Vulnerabilities and Exposures (CWEs), and Common Platform Languageions (CPEs)
Score: 3.7138962865789353
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The proliferation of software vulnerabilities poses a significant challenge for security databases and analysts tasked with their timely identification, classification, and remediation. With the National Vulnerability Database (NVD) reporting an ever-increasing number of vulnerabilities, the traditional manual analysis becomes untenably time-consuming and prone to errors. This paper introduces VulnScopper, an innovative approach that utilizes multi-modal representation learning, combining Knowledge Graphs (KG) and Natural Language Processing (NLP), to automate and enhance the analysis of software vulnerabilities. Leveraging ULTRA, a knowledge graph foundation model, combined with a Large Language Model (LLM), VulnScopper effectively handles unseen entities, overcoming the limitations of previous KG approaches. We evaluate VulnScopper on two major security datasets, the NVD and the Red Hat CVE database. Our method significantly improves the link prediction accuracy between Common Vulnerabilities and Exposures (CVEs), Common Weakness Enumeration (CWEs), and Common Platform Enumerations (CPEs). Our results show that VulnScopper outperforms existing methods, achieving up to 78% Hits@10 accuracy in linking CVEs to CPEs and CWEs and presenting an 11.7% improvement over large language models in predicting CWE labels based on the Red Hat database. Based on the NVD, only 6.37% of the linked CPEs are being published during the first 30 days; many of them are related to critical and high-risk vulnerabilities which, according to multiple compliance frameworks (such as CISA and PCI), should be remediated within 15-30 days. Our model can uncover new products linked to vulnerabilities, reducing remediation time and improving vulnerability management. We analyzed several CVEs from 2023 to showcase this ability.

Related papers

LLMxCPG: Context-Aware Vulnerability Detection Through Code Property Graph-Guided Large Language Models [2.891351178680099]
This paper presents a novel framework integrating Code Property Graphs (CPG) with Large Language Models (LLM) for robust vulnerability detection.<n>Our approach's ability to provide a more concise and accurate representation of code snippets enables the analysis of larger code segments.<n> Empirical evaluation demonstrates LLMxCPG's effectiveness across verified datasets, achieving 15-40% improvements in F1-score over state-of-the-art baselines.
arXiv Detail & Related papers (2025-07-22T13:36:33Z)
VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification [49.1574468325115]
Built on RoBERTa, VLAI is fine-tuned on over 600,000 real-world vulnerabilities.<n>The model and dataset are open-source and integrated into the Vulnerability-Lookup service.
arXiv Detail & Related papers (2025-07-04T14:28:14Z)
PoCGen: Generating Proof-of-Concept Exploits for Vulnerabilities in Npm Packages [16.130469984234956]
PoCGen is a novel approach to autonomously generate and validate PoC exploits for vulnerabilities in npm packages.<n>This is the first fully autonomous approach to use large language models (LLMs) in tandem with static and dynamic analysis techniques for PoC exploit generation.
arXiv Detail & Related papers (2025-06-05T12:37:33Z)
CyberGym: Evaluating AI Agents' Cybersecurity Capabilities with Real-World Vulnerabilities at Scale [46.76144797837242]
Large language model (LLM) agents are becoming increasingly skilled at handling cybersecurity tasks autonomously.<n>Existing benchmarks fall short, often failing to capture real-world scenarios or being limited in scope.<n>We introduce CyberGym, a large-scale and high-quality cybersecurity evaluation framework featuring 1,507 real-world vulnerabilities.
arXiv Detail & Related papers (2025-06-03T07:35:14Z)
Vulnerability Management Chaining: An Integrated Framework for Efficient Cybersecurity Risk Prioritization [0.0]
We present Vulnerability Management Chaining, a decision tree framework to achieve efficient vulnerability prioritization.<n>Our framework employs a two-stage evaluation process: first applying threat-based filtering using KEV membership or EPSS threshold $geq$ 0.088, then applying vulnerability severity assessment using CVSS scores $geq$ 7.0) to enable informed deprioritization.
arXiv Detail & Related papers (2025-06-02T00:06:54Z)
The Ripple Effect of Vulnerabilities in Maven Central: Prevalence, Propagation, and Mitigation Challenges [8.955037553566774]
We analyze the prevalence and impact of vulnerabilities within the Maven Central ecosystem using Common Vulnerabilities and Exposures data. In our subsample of around 4 million releases, we found that while only about 1% of releases have direct vulnerabilities. We also observed that the time taken to patch vulnerabilities, including those of high or critical severity, often spans several years.
arXiv Detail & Related papers (2025-04-05T13:45:27Z)
Fixseeker: An Empirical Driven Graph-based Approach for Detecting Silent Vulnerability Fixes in Open Source Software [12.706661324384319]
Open source software vulnerabilities pose significant security risks to downstream applications. Many security patches are released silently in new commits of OSS repositories without explicit indications of their security impact. We propose Fixseeker, a graph-based approach that extracts the various correlations between code changes at the hunk level to detect silent vulnerability fixes.
arXiv Detail & Related papers (2025-03-26T06:16:58Z)
Streamlining Security Vulnerability Triage with Large Language Models [0.786186571320448]
We present CASEY, a novel approach that automates the identification of Common Weaknessions (CWEs) of security bugs and assesses their severity. Casey achieved a CWE identification accuracy of 68%, a severity identification accuracy of 73.6%, and a combined accuracy of 51.2%.
arXiv Detail & Related papers (2025-01-31T06:02:24Z)
KBAlign: Efficient Self Adaptation on Specific Knowledge Bases [73.34893326181046]
We present KBAlign, a self-supervised framework that enhances RAG systems through efficient model adaptation.<n>Our key insight is to leverage the model's intrinsic capabilities for knowledge alignment through two innovative mechanisms.<n> Experiments demonstrate that KBAlign can achieve 90% of the performance gain obtained through GPT-4-supervised adaptation.
arXiv Detail & Related papers (2024-11-22T08:21:03Z)
CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity [49.657358248788945]
Textual descriptions in cyber threat intelligence (CTI) reports are rich sources of knowledge about cyber threats. Current CTI extraction methods lack flexibility and generalizability, often resulting in inaccurate and incomplete knowledge extraction. We propose CTINexus, a novel framework leveraging optimized in-context learning (ICL) of large language models.
arXiv Detail & Related papers (2024-10-28T14:18:32Z)
Enhancing Pre-Trained Language Models for Vulnerability Detection via Semantic-Preserving Data Augmentation [4.374800396968465]
We propose a data augmentation technique aimed at enhancing the performance of pre-trained language models for vulnerability detection. By incorporating our augmented dataset in fine-tuning a series of representative code pre-trained models, up to 10.1% increase in accuracy and 23.6% increase in F1 can be achieved.
arXiv Detail & Related papers (2024-09-30T21:44:05Z)
SAFE: Advancing Large Language Models in Leveraging Semantic and Syntactic Relationships for Software Vulnerability Detection [23.7268575752712]
Software vulnerabilities (SVs) have emerged as a prevalent and critical concern for safety-critical security systems. We propose a novel framework that enhances the capability of large language models to learn and utilize semantic and syntactic relationships from source code data for SVD.
arXiv Detail & Related papers (2024-09-02T00:49:02Z)
PriRoAgg: Achieving Robust Model Aggregation with Minimum Privacy Leakage for Federated Learning [49.916365792036636]
Federated learning (FL) has recently gained significant momentum due to its potential to leverage large-scale distributed user data. The transmitted model updates can potentially leak sensitive user information, and the lack of central control of the local training process leaves the global model susceptible to malicious manipulations on model updates. We develop a general framework PriRoAgg, utilizing Lagrange coded computing and distributed zero-knowledge proof, to execute a wide range of robust aggregation algorithms while satisfying aggregated privacy.
arXiv Detail & Related papers (2024-07-12T03:18:08Z)
CPE-Identifier: Automated CPE identification and CVE summaries annotation with Deep Learning and NLP [0.28281736775010774]
We propose the CPE-Identifier system, an automated CPE annotating and extracting system, from the CVE summaries. The system can be used as a tool to identify CPE entities from new CVE text inputs. We also apply Natural Language Processing (NLP) Named Entity Recognition (NER) to identify new technical jargons in the text.
arXiv Detail & Related papers (2024-05-22T12:05:17Z)
FaultGuard: A Generative Approach to Resilient Fault Prediction in Smart Electrical Grids [53.2306792009435]
FaultGuard is the first framework for fault type and zone classification resilient to adversarial attacks. We propose a low-complexity fault prediction model and an online adversarial training technique to enhance robustness. Our model outclasses the state-of-the-art for resilient fault prediction benchmarking, with an accuracy of up to 0.958.
arXiv Detail & Related papers (2024-03-26T08:51:23Z)
Automated CVE Analysis for Threat Prioritization and Impact Prediction [4.540236408836132]
We introduce our novel predictive model and tool (called CVEDrill) which revolutionizes CVE analysis and threat prioritization. CVEDrill accurately estimates the Common Vulnerability Scoring System (CVSS) vector for precise threat mitigation and priority ranking. It seamlessly automates the classification of CVEs into the appropriate Common Weaknession (CWE) hierarchy classes.
arXiv Detail & Related papers (2023-09-06T14:34:03Z)
G$^2$uardFL: Safeguarding Federated Learning Against Backdoor Attacks through Attributed Client Graph Clustering [116.4277292854053]
Federated Learning (FL) offers collaborative model training without data sharing. FL is vulnerable to backdoor attacks, where poisoned model weights lead to compromised system integrity. We present G$2$uardFL, a protective framework that reinterprets the identification of malicious clients as an attributed graph clustering problem.
arXiv Detail & Related papers (2023-06-08T07:15:04Z)
Confidence Attention and Generalization Enhanced Distillation for Continuous Video Domain Adaptation [62.458968086881555]
Continuous Video Domain Adaptation (CVDA) is a scenario where a source model is required to adapt to a series of individually available changing target domains. We propose a Confidence-Attentive network with geneRalization enhanced self-knowledge disTillation (CART) to address the challenge in CVDA.
arXiv Detail & Related papers (2023-03-18T16:40:10Z)
Free Lunch for Generating Effective Outlier Supervision [46.37464572099351]
We propose an ultra-effective method to generate near-realistic outlier supervision. Our proposed textttBayesAug significantly reduces the false positive rate over 12.50% compared with the previous schemes.
arXiv Detail & Related papers (2023-01-17T01:46:45Z)
VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code. Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph. VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z)
V2W-BERT: A Framework for Effective Hierarchical Multiclass Classification of Software Vulnerabilities [7.906207218788341]
We present a novel Transformer-based learning framework (V2W-BERT) in this paper. By using ideas from natural language processing, link prediction and transfer learning, our method outperforms previous approaches. We achieve up to 97% prediction accuracy for randomly partitioned data and up to 94% prediction accuracy in temporally partitioned data.
arXiv Detail & Related papers (2021-02-23T05:16:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.