LLM Embedding-based Attribution (LEA): Quantifying Source Contributions to Generative Model's Response for Vulnerability Analysis
- URL: http://arxiv.org/abs/2506.12100v2
- Date: Wed, 03 Sep 2025 17:18:40 GMT
- Title: LLM Embedding-based Attribution (LEA): Quantifying Source Contributions to Generative Model's Response for Vulnerability Analysis
- Authors: Reza Fayyazi, Michael Zuzak, Shanchieh Jay Yang,
- Abstract summary: Large Language Models (LLMs) are increasingly used for cybersecurity threat analysis, but their deployment in security-sensitive environments raises trust and safety concerns.<n>This work proposes Embedding Attribution (LEA) to analyze the generated responses for vulnerability exploitation analysis.<n>Our results demonstrate LEA's ability to detect clear distinctions between non-retrieval, generic-retrieval, and valid-retrieval scenarios with over 95% accuracy on larger models.
- Score: 1.3543506826034255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) are increasingly used for cybersecurity threat analysis, but their deployment in security-sensitive environments raises trust and safety concerns. With over 21,000 vulnerabilities disclosed in 2025, manual analysis is infeasible, making scalable and verifiable AI support critical. When querying LLMs, dealing with emerging vulnerabilities is challenging as they have a training cut-off date. While Retrieval-Augmented Generation (RAG) can inject up-to-date context to alleviate the cut-off date limitation, it remains unclear how much LLMs rely on retrieved evidence versus the model's internal knowledge, and whether the retrieved information is meaningful or even correct. This uncertainty could mislead security analysts, mis-prioritize patches, and increase security risks. Therefore, this work proposes LLM Embedding-based Attribution (LEA) to analyze the generated responses for vulnerability exploitation analysis. More specifically, LEA quantifies the relative contribution of internal knowledge vs. retrieved content in the generated responses. We evaluate LEA on 500 critical vulnerabilities disclosed between 2016 and 2025, across three RAG settings -- valid, generic, and incorrect -- using three state-of-the-art LLMs. Our results demonstrate LEA's ability to detect clear distinctions between non-retrieval, generic-retrieval, and valid-retrieval scenarios with over 95% accuracy on larger models. Finally, we demonstrate the limitations posed by incorrect retrieval of vulnerability information and raise a cautionary note to the cybersecurity community regarding the blind reliance on LLMs and RAG for vulnerability analysis. LEA offers security analysts with a metric to audit RAG-enhanced workflows, improving the transparent and trustworthy deployment of AI in cybersecurity threat analysis.
Related papers
- Adapting Large Language Models to Emerging Cybersecurity using Retrieval Augmented Generation [3.058685580689604]
Security applications are increasingly relying on large language models (LLMs) for cyber threat detection.<n>Because security threats evolve rapidly, LLMs must not only recall historical incidents but also adapt to emerging vulnerabilities and attack patterns.<n>We introduce a RAG-based framework designed to contextualize cybersecurity data and enhance LLM accuracy in knowledge retention and temporal reasoning.
arXiv Detail & Related papers (2025-10-31T00:59:53Z) - POLAR: Automating Cyber Threat Prioritization through LLM-Powered Assessment [13.18964488705143]
Large Language Models (LLMs) are intensively used to assist security analysts in counteracting the rapid exploitation of cyber threats.<n>In this paper, we investigate the intrinsic vulnerabilities of LLMs in cyber threat intelligence (CTI)<n>We introduce a novel categorization methodology that integrates stratification, autoregressive refinement, and human-in-the-loop supervision.
arXiv Detail & Related papers (2025-10-02T00:49:20Z) - Uncovering Vulnerabilities of LLM-Assisted Cyber Threat Intelligence [15.881854286231997]
Large Language Models (LLMs) are intensively used to assist security analysts in counteracting the rapid exploitation of cyber threats.<n>In this paper, we investigate the intrinsic vulnerabilities of LLMs in cyber threat intelligence (CTI)<n>We introduce a novel categorization methodology that integrates stratification, autoregressive refinement, and human-in-the-loop supervision.
arXiv Detail & Related papers (2025-09-28T02:08:27Z) - On the Surprising Efficacy of LLMs for Penetration-Testing [3.11537581064266]
The paper thoroughly reviews the evolution of Large Language Models (LLMs) in penetration testing.<n>It showcases their application across various offensive security tasks and covering broader phases of the cyber kill chain.<n>The paper identifies and discusses significant obstacles impeding wider adoption and safe deployment.
arXiv Detail & Related papers (2025-07-01T15:01:18Z) - Guiding AI to Fix Its Own Flaws: An Empirical Study on LLM-Driven Secure Code Generation [16.29310628754089]
Large Language Models (LLMs) have become powerful tools for automated code generation.<n>LLMs often overlook critical security practices, which can result in the generation of insecure code.<n>This paper examines their inherent tendencies to produce insecure code, their capability to generate secure code when guided by self-generated vulnerability hints, and their effectiveness in repairing vulnerabilities when provided with different levels of feedback.
arXiv Detail & Related papers (2025-06-28T23:24:33Z) - SafeMLRM: Demystifying Safety in Multi-modal Large Reasoning Models [50.34706204154244]
Acquiring reasoning capabilities catastrophically degrades inherited safety alignment.<n>Certain scenarios suffer 25 times higher attack rates.<n>Despite tight reasoning-answer safety coupling, MLRMs demonstrate nascent self-correction.
arXiv Detail & Related papers (2025-04-09T06:53:23Z) - Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Ambiguous Prompts and Unanswerable Questions [60.31496362993982]
Large language models (LLMs) frequently generate confident yet inaccurate responses.<n>We present a novel, test-time approach to detecting model hallucination through systematic analysis of information flow.
arXiv Detail & Related papers (2024-12-13T16:14:49Z) - Look Before You Leap: Enhancing Attention and Vigilance Regarding Harmful Content with GuidelineLLM [53.79753074854936]
Large language models (LLMs) are increasingly vulnerable to emerging jailbreak attacks.<n>This vulnerability poses significant risks to real-world applications.<n>We propose a novel defensive paradigm called GuidelineLLM.
arXiv Detail & Related papers (2024-12-10T12:42:33Z) - Targeting the Core: A Simple and Effective Method to Attack RAG-based Agents via Direct LLM Manipulation [4.241100280846233]
AI agents, powered by large language models (LLMs), have transformed human-computer interactions by enabling seamless, natural, and context-aware communication.<n>This paper investigates a critical vulnerability: adversarial attacks targeting the LLM core within AI agents.
arXiv Detail & Related papers (2024-12-05T18:38:30Z) - SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models [75.67623347512368]
We propose toolns, a comprehensive framework designed for conducting safety evaluations of MLLMs.
Our framework consists of a comprehensive harmful query dataset and an automated evaluation protocol.
Based on our framework, we conducted large-scale experiments on 15 widely-used open-source MLLMs and 6 commercial MLLMs.
arXiv Detail & Related papers (2024-10-24T17:14:40Z) - ProveRAG: Provenance-Driven Vulnerability Analysis with Automated Retrieval-Augmented LLMs [1.7191671053507043]
Security analysts face the challenge of mitigating newly discovered vulnerabilities in real-time.<n>Over 300,000 Common Vulnerabilities and Exposures have been identified since 1999.<n>Over 25,000 vulnerabilities have been identified so far in 2024.
arXiv Detail & Related papers (2024-10-22T20:28:57Z) - Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement [51.601916604301685]
Large language models (LLMs) generate content that can undermine trust in online discourse.<n>Current methods often focus on binary classification, failing to address the complexities of real-world scenarios like human-LLM collaboration.<n>To move beyond binary classification and address these challenges, we propose a new paradigm for detecting LLM-generated content.
arXiv Detail & Related papers (2024-10-18T08:14:10Z) - Securing Large Language Models: Addressing Bias, Misinformation, and Prompt Attacks [12.893445918647842]
Large Language Models (LLMs) demonstrate impressive capabilities across various fields, yet their increasing use raises critical security concerns.
This article reviews recent literature addressing key issues in LLM security, with a focus on accuracy, bias, content detection, and vulnerability to attacks.
arXiv Detail & Related papers (2024-09-12T14:42:08Z) - Can LLMs be Fooled? Investigating Vulnerabilities in LLMs [4.927763944523323]
The advent of Large Language Models (LLMs) has garnered significant popularity and wielded immense power across various domains within Natural Language Processing (NLP)
This paper will synthesize the findings from each vulnerability section and propose new directions of research and development.
By understanding the focal points of current vulnerabilities, we can better anticipate and mitigate future risks.
arXiv Detail & Related papers (2024-07-30T04:08:00Z) - Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses.
Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives.
The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z) - Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress.
Our investigation exposes a critical oversight in this belief.
By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z) - Learning to Poison Large Language Models for Downstream Manipulation [12.521338629194503]
This work identifies additional security risks in Large Language Models (LLMs) by designing a new data poisoning attack tailored to exploit the supervised fine-tuning process.<n>We propose a novel gradient-guided backdoor trigger learning (GBTL) algorithm to identify adversarial triggers efficiently.<n>We propose two defense strategies against data poisoning attacks, including in-context learning (ICL) and continuous learning (CL)
arXiv Detail & Related papers (2024-02-21T01:30:03Z) - Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models [79.0183835295533]
We introduce the first benchmark for indirect prompt injection attacks, named BIPIA, to assess the risk of such vulnerabilities.<n>Our analysis identifies two key factors contributing to their success: LLMs' inability to distinguish between informational context and actionable instructions, and their lack of awareness in avoiding the execution of instructions within external content.<n>We propose two novel defense mechanisms-boundary awareness and explicit reminder-to address these vulnerabilities in both black-box and white-box settings.
arXiv Detail & Related papers (2023-12-21T01:08:39Z) - Mitigating Large Language Model Hallucinations via Autonomous Knowledge
Graph-based Retrofitting [51.7049140329611]
This paper proposes Knowledge Graph-based Retrofitting (KGR) to mitigate factual hallucination during the reasoning process.
Experiments show that KGR can significantly improve the performance of LLMs on factual QA benchmarks.
arXiv Detail & Related papers (2023-11-22T11:08:38Z) - Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs [59.596335292426105]
This paper collects the first open-source dataset to evaluate safeguards in large language models.
We train several BERT-like classifiers to achieve results comparable with GPT-4 on automatic safety evaluation.
arXiv Detail & Related papers (2023-08-25T14:02:12Z) - Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation [109.8527403904657]
We show that large language models (LLMs) possess unwavering confidence in their knowledge and cannot handle the conflict between internal and external knowledge well.
Retrieval augmentation proves to be an effective approach in enhancing LLMs' awareness of knowledge boundaries.
We propose a simple method to dynamically utilize supporting documents with our judgement strategy.
arXiv Detail & Related papers (2023-07-20T16:46:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.