Predicting Known Vulnerabilities from Attack Descriptions Using Sentence Transformers
- URL: http://arxiv.org/abs/2602.22433v1
- Date: Wed, 25 Feb 2026 21:44:57 GMT
- Title: Predicting Known Vulnerabilities from Attack Descriptions Using Sentence Transformers
- Authors: Refat Othman,
- Abstract summary: This thesis addresses the problem of predicting known vulnerabilities from natural-language descriptions of cyberattacks.<n>We develop transformer-based sentence embedding methods that encode attack and vulnerability descriptions into semantic vector representations.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern infrastructures rely on software systems that remain vulnerable to cyberattacks. These attacks frequently exploit vulnerabilities documented in repositories such as MITRE's Common Vulnerabilities and Exposures (CVE). However, Cyber Threat Intelligence resources, including MITRE ATT&CK and CVE, provide only partial coverage of attack-vulnerability relationships. Attack information often appears before vulnerabilities are formally linked, creating the need for automated methods that infer likely vulnerabilities directly from attack descriptions. This thesis addresses the problem of predicting known vulnerabilities from natural-language descriptions of cyberattacks. We develop transformer-based sentence embedding methods that encode attack and vulnerability descriptions into semantic vector representations, enabling similarity-based ranking and recommendation. Fourteen state-of-the-art transformer models were evaluated across four attack description types (Tactic, Technique, Procedure, and Attack Pattern). Results show that Technique descriptions in MITRE ATT&CK provide the strongest predictive signal. The multi-qa-mpnet-base-dot-v1 (MMPNet) model achieved the best performance due to its hybrid pre-training and optimization for semantic similarity. The approach was implemented in the VULDAT tool, which automatically links attacks to vulnerabilities. Manual validation revealed previously undocumented relationships in MITRE repositories. Evaluation on unseen cyberattack reports demonstrates that the models generalize beyond curated datasets and support proactive vulnerability awareness.
Related papers
- CIBER: A Comprehensive Benchmark for Security Evaluation of Code Interpreter Agents [27.35968236632966]
LLM-based code interpreter agents are increasingly deployed in critical situations.<n>Existing benchmarks fail to capture the security risks arising from dynamic code execution, tool interactions, and multi-turn context.<n>We introduce CIBER, an automated benchmark that combines dynamic attack generation, isolated secure sandboxing, and state-aware evaluation.
arXiv Detail & Related papers (2026-02-23T06:41:41Z) - The Trojan Knowledge: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search [58.8834056209347]
Large language models (LLMs) remain vulnerable to jailbreak attacks that bypass safety guardrails to elicit harmful outputs.<n>We introduce the Correlated Knowledge Attack Agent (CKA-Agent), a dynamic framework that reframes jailbreaking as an adaptive, tree-structured exploration of the target model's knowledge base.
arXiv Detail & Related papers (2025-12-01T07:05:23Z) - Cuckoo Attack: Stealthy and Persistent Attacks Against AI-IDE [64.47951172662745]
Cuckoo Attack is a novel attack that achieves stealthy and persistent command execution by embedding malicious payloads into configuration files.<n>We formalize our attack paradigm into two stages, including initial infection and persistence.<n>We contribute seven actionable checkpoints for vendors to evaluate their product security.
arXiv Detail & Related papers (2025-09-19T04:10:52Z) - From Attack Descriptions to Vulnerabilities: A Sentence Transformer-Based Approach [0.39134914399411086]
This paper evaluates 14 state-of-the-art sentence transformers for automatically identifying vulnerabilities from textual descriptions of attacks.<n>On average, 56% of the vulnerabilities identified by the MMPNet model are also represented within the CVE repository in conjunction with an attack.<n>A manual inspection of the results revealed the existence of 275 predicted links that were not documented in the MITRE repositories.
arXiv Detail & Related papers (2025-09-02T08:27:36Z) - The Application of Transformer-Based Models for Predicting Consequences of Cyber Attacks [0.4604003661048266]
Threat Modeling can provide critical support to cybersecurity professionals, enabling them to take timely action and allocate resources that could be used elsewhere.<n>Recently, there has been a pressing need for automated methods to assess attack descriptions and forecast the future consequences of cyberattacks.<n>This study examines how Natural Language Processing (NLP) and deep learning can be applied to analyze the potential impact of cyberattacks.
arXiv Detail & Related papers (2025-08-18T15:46:36Z) - Secure Tug-of-War (SecTOW): Iterative Defense-Attack Training with Reinforcement Learning for Multimodal Model Security [63.41350337821108]
We propose Secure Tug-of-War (SecTOW) to enhance the security of multimodal large language models (MLLMs)<n>SecTOW consists of two modules: a defender and an auxiliary attacker, both trained iteratively using reinforcement learning (GRPO)<n>We show that SecTOW significantly improves security while preserving general performance.
arXiv Detail & Related papers (2025-07-29T17:39:48Z) - Poison Once, Control Anywhere: Clean-Text Visual Backdoors in VLM-based Mobile Agents [54.35629963816521]
This work introduces VIBMA, the first clean-text backdoor attack targeting VLM-based mobile agents.<n>The attack injects malicious behaviors into the model by modifying only the visual input.<n>We show that our attack achieves high success rates while preserving clean-task behavior.
arXiv Detail & Related papers (2025-06-16T08:09:32Z) - Defending Large Language Models against Jailbreak Attacks via Semantic
Smoothing [107.97160023681184]
Aligned large language models (LLMs) are vulnerable to jailbreaking attacks.
We propose SEMANTICSMOOTH, a smoothing-based defense that aggregates predictions of semantically transformed copies of a given input prompt.
arXiv Detail & Related papers (2024-02-25T20:36:03Z) - Zero-shot learning approach to adaptive Cybersecurity using Explainable
AI [0.5076419064097734]
We present a novel approach to handle the alarm flooding problem faced by Cybersecurity systems like security information and event management (SIEM) and intrusion detection (IDS)
We apply a zero-shot learning method to machine learning (ML) by leveraging explanations for predictions of anomalies generated by a ML model.
In this approach, without any prior knowledge of attack, we try to identify it, decipher the features that contribute to classification and try to bucketize the attack in a specific category.
arXiv Detail & Related papers (2021-06-21T06:29:13Z) - An Automated, End-to-End Framework for Modeling Attacks From
Vulnerability Descriptions [46.40410084504383]
In order to derive a relevant attack graph, up-to-date information on known attack techniques should be represented as interaction rules.
We present a novel, end-to-end, automated framework for modeling new attack techniques from textual description of a security vulnerability.
arXiv Detail & Related papers (2020-08-10T19:27:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.