Predicting known Vulnerabilities from Attack News: A Transformer-Based Approach
- URL: http://arxiv.org/abs/2602.19606v1
- Date: Mon, 23 Feb 2026 08:47:48 GMT
- Title: Predicting known Vulnerabilities from Attack News: A Transformer-Based Approach
- Authors: Refat Othman, Diaeddin Rimawi, Bruno Rossi, Barbara Russo,
- Abstract summary: This paper examines the process of predicting software vulnerabilities, specifically Common Vulnerabilities and Exposures ( CVEs)<n>We propose a semantic similarity-based approach to generate a ranked list of the most likely CVEs corresponding to each news report.<n> Experimental results show that the model attains a precision of 81 percent when employing threshold-based filtering.
- Score: 0.39134914399411086
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Identifying the vulnerabilities exploited during cyberattacks is essential for enabling timely responses and effective mitigation in software security. This paper directly examines the process of predicting software vulnerabilities, specifically Common Vulnerabilities and Exposures (CVEs), from unstructured descriptions of attacks reported in cybersecurity news articles. We propose a semantic similarity-based approach utilizing the multi-qa-mpnet-base-dot-v1 (MPNet) sentence transformer model to generate a ranked list of the most likely CVEs corresponding to each news report. To assess the accuracy of the predicted vulnerabilities, we implement four complementary validation methods: filtering predictions based on similarity thresholds, conducting manual validation, performing semantic comparisons with the first vulnerability explicitly mentioned in each report, and comparing against all CVEs referenced within the report. Experimental results, drawn from a dataset of 100 SecurityWeek news articles, demonstrate that the model attains a precision of 81 percent when employing threshold-based filtering. Manual evaluations report that 70 percent of the predictions are relevant, while comparisons with the initially mentioned CVEs reveal agreement rates of 80 percent with the first listed vulnerability and 78 percent across all referenced CVEs. In 57 percent of the news reports analyzed, at least one predicted vulnerability precisely matched a CVE-ID mentioned in the article. These findings underscore the model's potential to facilitate automated vulnerability identification from real-world cyberattack news reports.
Related papers
- Predicting Known Vulnerabilities from Attack Descriptions Using Sentence Transformers [0.0]
This thesis addresses the problem of predicting known vulnerabilities from natural-language descriptions of cyberattacks.<n>We develop transformer-based sentence embedding methods that encode attack and vulnerability descriptions into semantic vector representations.
arXiv Detail & Related papers (2026-02-25T21:44:57Z) - ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack [52.17935054046577]
We present ReasAlign, a model-level solution to improve safety alignment against indirect prompt injection attacks.<n>ReasAlign incorporates structured reasoning steps to analyze user queries, detect conflicting instructions, and preserve the continuity of the user's intended tasks.
arXiv Detail & Related papers (2026-01-15T08:23:38Z) - VulAgent: Hypothesis-Validation based Multi-Agent Vulnerability Detection [55.957275374847484]
VulAgent is a multi-agent vulnerability detection framework based on hypothesis validation.<n>It implements a semantics-sensitive, multi-view detection pipeline, each aligned to a specific analysis perspective.<n>On average, VulAgent improves overall accuracy by 6.6%, increases the correct identification rate of vulnerable--fixed code pairs by up to 450%, and reduces the false positive rate by about 36%.
arXiv Detail & Related papers (2025-09-15T02:25:38Z) - From Attack Descriptions to Vulnerabilities: A Sentence Transformer-Based Approach [0.39134914399411086]
This paper evaluates 14 state-of-the-art sentence transformers for automatically identifying vulnerabilities from textual descriptions of attacks.<n>On average, 56% of the vulnerabilities identified by the MMPNet model are also represented within the CVE repository in conjunction with an attack.<n>A manual inspection of the results revealed the existence of 275 predicted links that were not documented in the MITRE repositories.
arXiv Detail & Related papers (2025-09-02T08:27:36Z) - VISION: Robust and Interpretable Code Vulnerability Detection Leveraging Counterfactual Augmentation [6.576811224645293]
Graph Neural Networks (GNNs) can learn structural and logical code relationships in a data-driven manner.<n>GNNs often learn'spurious' correlations from superficial code similarities.<n>We propose a unified framework for robust and interpretable vulnerability detection, called VISION.
arXiv Detail & Related papers (2025-08-26T11:20:39Z) - VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification [49.1574468325115]
Built on RoBERTa, VLAI is fine-tuned on over 600,000 real-world vulnerabilities.<n>The model and dataset are open-source and integrated into the Vulnerability-Lookup service.
arXiv Detail & Related papers (2025-07-04T14:28:14Z) - Conformal Prediction Sets with Improved Conditional Coverage using Trust Scores [52.92618442300405]
It is impossible to achieve exact, distribution-free conditional coverage in finite samples.<n>We propose an alternative conformal prediction algorithm that targets coverage where it matters most.
arXiv Detail & Related papers (2025-01-17T12:01:56Z) - Cybersecurity Defenses: Exploration of CVE Types through Attack Descriptions [1.0474508494260908]
VULDAT is a classification tool using a sentence transformer MPNET to identify system vulnerabilities from attack descriptions.
Our model was applied to 100 attack techniques from the ATT&CK repository and 685 issues from the CVE repository.
Our findings indicate that our model achieves the best performance with F1 score of 0.85, Precision of 0.86, and Recall of 0.83.
arXiv Detail & Related papers (2024-07-09T11:08:35Z) - Unveiling Hidden Links Between Unseen Security Entities [3.7138962865789353]
VulnScopper is an innovative approach that utilizes multi-modal representation learning, combining Knowledge Graphs (KG) and Natural Processing (NLP)
We evaluate VulnScopper on two major security datasets, the National Vulnerability Database (NVD) and the Red Hat CVE database.
Our results show that VulnScopper outperforms existing methods, achieving up to 78% Hits@10 accuracy in linking CVEs to Common Vulnerabilities and Exposures (CWEs), and Common Platform Languageions (CPEs)
arXiv Detail & Related papers (2024-03-04T13:14:39Z) - Conservative Prediction via Data-Driven Confidence Minimization [70.93946578046003]
In safety-critical applications of machine learning, it is often desirable for a model to be conservative.
We propose the Data-Driven Confidence Minimization framework, which minimizes confidence on an uncertainty dataset.
arXiv Detail & Related papers (2023-06-08T07:05:36Z) - Free Lunch for Generating Effective Outlier Supervision [46.37464572099351]
We propose an ultra-effective method to generate near-realistic outlier supervision.
Our proposed textttBayesAug significantly reduces the false positive rate over 12.50% compared with the previous schemes.
arXiv Detail & Related papers (2023-01-17T01:46:45Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - CC-Cert: A Probabilistic Approach to Certify General Robustness of
Neural Networks [58.29502185344086]
In safety-critical machine learning applications, it is crucial to defend models against adversarial attacks.
It is important to provide provable guarantees for deep learning models against semantically meaningful input transformations.
We propose a new universal probabilistic certification approach based on Chernoff-Cramer bounds.
arXiv Detail & Related papers (2021-09-22T12:46:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.