Vulnerability Clustering and other Machine Learning Applications of
Semantic Vulnerability Embeddings
- URL: http://arxiv.org/abs/2310.05935v1
- Date: Wed, 23 Aug 2023 21:39:48 GMT
- Title: Vulnerability Clustering and other Machine Learning Applications of
Semantic Vulnerability Embeddings
- Authors: Mark-Oliver Stehr, Minyoung Kim
- Abstract summary: We investigated different types of semantic vulnerability embeddings based on natural language processing (NLP) techniques.
We also evaluated their use as a foundation for machine learning applications that can support cyber-security researchers and analysts.
The particular applications we explored and briefly summarize are clustering, classification, and visualization.
- Score: 23.143031911859847
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cyber-security vulnerabilities are usually published in form of short natural
language descriptions (e.g., in form of MITRE's CVE list) that over time are
further manually enriched with labels such as those defined by the Common
Vulnerability Scoring System (CVSS). In the Vulnerability AI (Analytics and
Intelligence) project, we investigated different types of semantic
vulnerability embeddings based on natural language processing (NLP) techniques
to obtain a concise representation of the vulnerability space. We also
evaluated their use as a foundation for machine learning applications that can
support cyber-security researchers and analysts in risk assessment and other
related activities. The particular applications we explored and briefly
summarize in this report are clustering, classification, and visualization, as
well as a new logic-based approach to evaluate theories about the vulnerability
space.
Related papers
- CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity [49.657358248788945]
Textual descriptions in cyber threat intelligence (CTI) reports are rich sources of knowledge about cyber threats.
Current CTI extraction methods lack flexibility and generalizability, often resulting in inaccurate and incomplete knowledge extraction.
We propose CTINexus, a novel framework leveraging optimized in-context learning (ICL) of large language models.
arXiv Detail & Related papers (2024-10-28T14:18:32Z) - Enhancing Code Vulnerability Detection via Vulnerability-Preserving Data Augmentation [29.72520866016839]
Source code vulnerability detection aims to identify inherent vulnerabilities to safeguard software systems from potential attacks.
Many prior studies overlook diverse vulnerability characteristics, simplifying the problem into a binary (0-1) classification task.
FGVulDet employs multiple classifiers to discern characteristics of various vulnerability types and combines their outputs to identify the specific type of vulnerability.
FGVulDet is trained on a large-scale dataset from GitHub, encompassing five different types of vulnerabilities.
arXiv Detail & Related papers (2024-04-15T09:10:52Z) - Data Poisoning for In-context Learning [49.77204165250528]
In-context learning (ICL) has been recognized for its innovative ability to adapt to new tasks.
This paper delves into the critical issue of ICL's susceptibility to data poisoning attacks.
We introduce ICLPoison, a specialized attacking framework conceived to exploit the learning mechanisms of ICL.
arXiv Detail & Related papers (2024-02-03T14:20:20Z) - Unveiling Safety Vulnerabilities of Large Language Models [4.562678399685183]
This paper introduces a unique dataset containing adversarial examples in the form of questions, which we call AttaQ.
We assess the efficacy of our dataset by analyzing the vulnerabilities of various models when subjected to it.
We introduce a novel automatic approach for identifying and naming vulnerable semantic regions.
arXiv Detail & Related papers (2023-11-07T16:50:33Z) - Survey of Vulnerabilities in Large Language Models Revealed by
Adversarial Attacks [5.860289498416911]
Large Language Models (LLMs) are swiftly advancing in architecture and capability.
As they integrate more deeply into complex systems, the urgency to scrutinize their security properties grows.
This paper surveys research in the emerging interdisciplinary field of adversarial attacks on LLMs.
arXiv Detail & Related papers (2023-10-16T21:37:24Z) - A Survey on Automated Software Vulnerability Detection Using Machine
Learning and Deep Learning [19.163031235081565]
Machine Learning (ML) and Deep Learning (DL) based models for detecting vulnerabilities in source code have been presented in recent years.
It may be difficult to discover gaps in existing research and potential for future improvement without a comprehensive survey.
This work address that gap by presenting a systematic survey to characterize various features of ML/DL-based source code level software vulnerability detection approaches.
arXiv Detail & Related papers (2023-06-20T16:51:59Z) - Towards Automated Classification of Attackers' TTPs by combining NLP
with ML Techniques [77.34726150561087]
We evaluate and compare different Natural Language Processing (NLP) and machine learning techniques used for security information extraction in research.
Based on our investigations we propose a data processing pipeline that automatically classifies unstructured text according to attackers' tactics and techniques.
arXiv Detail & Related papers (2022-07-18T09:59:21Z) - CVSS-BERT: Explainable Natural Language Processing to Determine the
Severity of a Computer Security Vulnerability from its Description [0.0]
Cybersecurity experts provide an analysis of the severity of a vulnerability using the Common Vulnerability Scoring System (CVSS)
We propose to leverage recent advances in the field of Natural Language Processing (NLP) to determine the CVSS vector and the associated severity score of a vulnerability in an explainable manner.
arXiv Detail & Related papers (2021-11-16T14:31:09Z) - Inspect, Understand, Overcome: A Survey of Practical Methods for AI
Safety [54.478842696269304]
The use of deep neural networks (DNNs) in safety-critical applications is challenging due to numerous model-inherent shortcomings.
In recent years, a zoo of state-of-the-art techniques aiming to address these safety concerns has emerged.
Our paper addresses both machine learning experts and safety engineers.
arXiv Detail & Related papers (2021-04-29T09:54:54Z) - Predicting Themes within Complex Unstructured Texts: A Case Study on
Safeguarding Reports [66.39150945184683]
We focus on the problem of automatically identifying the main themes in a safeguarding report using supervised classification approaches.
Our results show the potential of deep learning models to simulate subject-expert behaviour even for complex tasks with limited labelled data.
arXiv Detail & Related papers (2020-10-27T19:48:23Z) - Dos and Don'ts of Machine Learning in Computer Security [74.1816306998445]
Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance.
We identify common pitfalls in the design, implementation, and evaluation of learning-based security systems.
We propose actionable recommendations to support researchers in avoiding or mitigating the pitfalls where possible.
arXiv Detail & Related papers (2020-10-19T13:09:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.