CVSS-BERT: Explainable Natural Language Processing to Determine the
Severity of a Computer Security Vulnerability from its Description
- URL: http://arxiv.org/abs/2111.08510v1
- Date: Tue, 16 Nov 2021 14:31:09 GMT
- Title: CVSS-BERT: Explainable Natural Language Processing to Determine the
Severity of a Computer Security Vulnerability from its Description
- Authors: Mustafizur Shahid (IP Paris), Herv\'e Debar
- Abstract summary: Cybersecurity experts provide an analysis of the severity of a vulnerability using the Common Vulnerability Scoring System (CVSS)
We propose to leverage recent advances in the field of Natural Language Processing (NLP) to determine the CVSS vector and the associated severity score of a vulnerability in an explainable manner.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When a new computer security vulnerability is publicly disclosed, only a
textual description of it is available. Cybersecurity experts later provide an
analysis of the severity of the vulnerability using the Common Vulnerability
Scoring System (CVSS). Specifically, the different characteristics of the
vulnerability are summarized into a vector (consisting of a set of metrics),
from which a severity score is computed. However, because of the high number of
vulnerabilities disclosed everyday this process requires lot of manpower, and
several days may pass before a vulnerability is analyzed. We propose to
leverage recent advances in the field of Natural Language Processing (NLP) to
determine the CVSS vector and the associated severity score of a vulnerability
from its textual description in an explainable manner. To this purpose, we
trained multiple BERT classifiers, one for each metric composing the CVSS
vector. Experimental results show that our trained classifiers are able to
determine the value of the metrics of the CVSS vector with high accuracy. The
severity score computed from the predicted CVSS vector is also very close to
the real severity score attributed by a human expert. For explainability
purpose, gradient-based input saliency method was used to determine the most
relevant input words for a given prediction made by our classifiers. Often, the
top relevant words include terms in agreement with the rationales of a human
cybersecurity expert, making the explanation comprehensible for end-users.
Related papers
- CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity [49.657358248788945]
Textual descriptions in cyber threat intelligence (CTI) reports are rich sources of knowledge about cyber threats.
Current CTI extraction methods lack flexibility and generalizability, often resulting in inaccurate and incomplete knowledge extraction.
We propose CTINexus, a novel framework leveraging optimized in-context learning (ICL) of large language models.
arXiv Detail & Related papers (2024-10-28T14:18:32Z) - Vulnerability of LLMs to Vertically Aligned Text Manipulations [108.6908427615402]
Large language models (LLMs) have become highly effective at performing text classification tasks.
modifying input formats, such as vertically aligning words for encoder-based models, can substantially lower accuracy in text classification tasks.
Do decoder-based LLMs exhibit similar vulnerabilities to vertically formatted text input?
arXiv Detail & Related papers (2024-10-26T00:16:08Z) - Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding [118.75567341513897]
Existing methods typically analyze target text in isolation or solely with non-member contexts.
We propose Con-ReCall, a novel approach that leverages the asymmetric distributional shifts induced by member and non-member contexts.
arXiv Detail & Related papers (2024-09-05T09:10:38Z) - The Vulnerability Is in the Details: Locating Fine-grained Information of Vulnerable Code Identified by Graph-based Detectors [33.395068754566935]
VULEXPLAINER is a tool for locating vulnerability-critical code lines from coarse-level vulnerable code snippets.
It can flag the vulnerability-triggering code statements with an accuracy of around 90% against eight common C/C++ vulnerabilities.
arXiv Detail & Related papers (2024-01-05T10:15:04Z) - ASSERT: Automated Safety Scenario Red Teaming for Evaluating the
Robustness of Large Language Models [65.79770974145983]
ASSERT, Automated Safety Scenario Red Teaming, consists of three methods -- semantically aligned augmentation, target bootstrapping, and adversarial knowledge injection.
We partition our prompts into four safety domains for a fine-grained analysis of how the domain affects model performance.
We find statistically significant performance differences of up to 11% in absolute classification accuracy among semantically related scenarios and error rates of up to 19% absolute error in zero-shot adversarial settings.
arXiv Detail & Related papers (2023-10-14T17:10:28Z) - Gotta Catch 'em All: Aggregating CVSS Scores [1.5839621757142595]
We propose aCVSS aggregation algorithm that integrates information aboutthe functionality of the SUT, exploitation difficulty,existence of exploits, and the context where the SUT operates.
The aggregation algorithm was applied to OpenPLC V3, showing that it is capable of filtering out vulnerabilities that cannot beexploited in the real conditions of deployment.
arXiv Detail & Related papers (2023-10-03T14:04:40Z) - Automated CVE Analysis for Threat Prioritization and Impact Prediction [4.540236408836132]
We introduce our novel predictive model and tool (called CVEDrill) which revolutionizes CVE analysis and threat prioritization.
CVEDrill accurately estimates the Common Vulnerability Scoring System (CVSS) vector for precise threat mitigation and priority ranking.
It seamlessly automates the classification of CVEs into the appropriate Common Weaknession (CWE) hierarchy classes.
arXiv Detail & Related papers (2023-09-06T14:34:03Z) - Vulnerability Clustering and other Machine Learning Applications of
Semantic Vulnerability Embeddings [23.143031911859847]
We investigated different types of semantic vulnerability embeddings based on natural language processing (NLP) techniques.
We also evaluated their use as a foundation for machine learning applications that can support cyber-security researchers and analysts.
The particular applications we explored and briefly summarize are clustering, classification, and visualization.
arXiv Detail & Related papers (2023-08-23T21:39:48Z) - Common Vulnerability Scoring System Prediction based on Open Source
Intelligence Information Sources [0.0]
This work provides a classification of the National Vulnerability Database's reference texts based on the suitability and crawlability of their texts.
While we identified the overall influence of the additional texts is negligible, we outperformed the state-of-the-art with our Deep Learning prediction models.
arXiv Detail & Related papers (2022-10-05T10:54:15Z) - Spotting adversarial samples for speaker verification by neural vocoders [102.1486475058963]
We adopt neural vocoders to spot adversarial samples for automatic speaker verification (ASV)
We find that the difference between the ASV scores for the original and re-synthesize audio is a good indicator for discrimination between genuine and adversarial samples.
Our codes will be made open-source for future works to do comparison.
arXiv Detail & Related papers (2021-07-01T08:58:16Z) - Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks.
We present a unifying view of randomized smoothing over arbitrary functions.
We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.