AutoPatch: Multi-Agent Framework for Patching Real-World CVE Vulnerabilities
- URL: http://arxiv.org/abs/2505.04195v1
- Date: Wed, 07 May 2025 07:49:05 GMT
- Title: AutoPatch: Multi-Agent Framework for Patching Real-World CVE Vulnerabilities
- Authors: Minjae Seo, Wonwoo Choi, Myoungsung You, Seungwon Shin,
- Abstract summary: Large Language Models (LLMs) have emerged as promising tools in software development.<n>Their knowledge is limited to a fixed cutoff date, making them prone to generating code vulnerable to newly disclosed CVEs.<n>We propose AutoPatch, a framework designed to patch vulnerable LLM-generated code.
- Score: 7.812032134834162
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have emerged as promising tools in software development, enabling automated code generation and analysis. However, their knowledge is limited to a fixed cutoff date, making them prone to generating code vulnerable to newly disclosed CVEs. Frequent fine-tuning with new CVE sets is costly, and existing LLM-based approaches focus on oversimplified CWE examples and require providing explicit bug locations to LLMs, limiting their ability to patch complex real-world vulnerabilities. To address these limitations, we propose AutoPatch, a multi-agent framework designed to patch vulnerable LLM-generated code, particularly those introduced after the LLMs' knowledge cutoff. AutoPatch integrates Retrieval-Augmented Generation (RAG) with a structured database of recently disclosed vulnerabilities, comprising 525 code snippets derived from 75 high-severity CVEs across real-world systems such as the Linux kernel and Chrome. AutoPatch combines semantic and taint analysis to identify the most relevant CVE and leverages enhanced Chain-of-Thought (CoT) reasoning to construct enriched prompts for verification and patching. Our unified similarity model, which selects the most relevant vulnerabilities, achieves 90.4 percent accuracy in CVE matching. AutoPatch attains 89.5 percent F1-score for vulnerability verification and 95.0 percent accuracy in patching, while being over 50x more cost-efficient than traditional fine-tuning approaches.
Related papers
- LLMxCPG: Context-Aware Vulnerability Detection Through Code Property Graph-Guided Large Language Models [2.891351178680099]
This paper presents a novel framework integrating Code Property Graphs (CPG) with Large Language Models (LLM) for robust vulnerability detection.<n>Our approach's ability to provide a more concise and accurate representation of code snippets enables the analysis of larger code segments.<n> Empirical evaluation demonstrates LLMxCPG's effectiveness across verified datasets, achieving 15-40% improvements in F1-score over state-of-the-art baselines.
arXiv Detail & Related papers (2025-07-22T13:36:33Z) - CyberGym: Evaluating AI Agents' Cybersecurity Capabilities with Real-World Vulnerabilities at Scale [46.76144797837242]
Large language model (LLM) agents are becoming increasingly skilled at handling cybersecurity tasks autonomously.<n>Existing benchmarks fall short, often failing to capture real-world scenarios or being limited in scope.<n>We introduce CyberGym, a large-scale and high-quality cybersecurity evaluation framework featuring 1,507 real-world vulnerabilities.
arXiv Detail & Related papers (2025-06-03T07:35:14Z) - AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security [74.22452069013289]
AegisLLM is a cooperative multi-agent defense against adversarial attacks and information leakage.<n>We show that scaling agentic reasoning system at test-time substantially enhances robustness without compromising model utility.<n> Comprehensive evaluations across key threat scenarios, including unlearning and jailbreaking, demonstrate the effectiveness of AegisLLM.
arXiv Detail & Related papers (2025-04-29T17:36:05Z) - LLM4CVE: Enabling Iterative Automated Vulnerability Repair with Large Language Models [9.946058168276744]
Large Language Models (LLM) have opened up the possibility for many software defects to be patched automatically.<n>We propose an iterative pipeline that robustly fixes vulnerable functions in real-world code with high accuracy.<n>We achieve a human-verified quality score of 8.51/10 and an increase in groundtruth code similarity of 20% with Llama 3 70B.
arXiv Detail & Related papers (2025-01-07T00:21:42Z) - There are More Fish in the Sea: Automated Vulnerability Repair via Binary Templates [4.907610470063863]
We propose a template-based automated vulnerability repair approach for Java binaries.<n>Experiments on the Vul4J dataset demonstrate that TemVUR successfully repairs 11 vulnerabilities.<n>To assess the generalizability of TemVUR, we curate the ManyVuls4J dataset.
arXiv Detail & Related papers (2024-11-27T06:59:45Z) - Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities [63.603861880022954]
We introduce ADV-LLM, an iterative self-tuning process that crafts adversarial LLMs with enhanced jailbreak ability.<n>Our framework significantly reduces the computational cost of generating adversarial suffixes while achieving nearly 100% ASR on various open-source LLMs.<n>It exhibits strong attack transferability to closed-source models, achieving 99% ASR on GPT-3.5 and 49% ASR on GPT-4, despite being optimized solely on Llama3.
arXiv Detail & Related papers (2024-10-24T06:36:12Z) - APPATCH: Automated Adaptive Prompting Large Language Models for Real-World Software Vulnerability Patching [24.958856670970366]
In this paper, we leverage the power and merits of pre-trained language language models (LLMs) to enable automated vulnerability patching.<n>To elicit LLMs to effectively reason about vulnerable code behaviors, we introduce vulnerability semantics reasoning and adaptive prompting.<n>Our evaluation of AP on 97 zero-day vulnerabilities and 20 existing vulnerabilities demonstrates its superior performance to both existing methods and state-of-theart non-LLM-based techniques.
arXiv Detail & Related papers (2024-08-24T14:51:50Z) - PatchFinder: A Two-Phase Approach to Security Patch Tracing for Disclosed Vulnerabilities in Open-Source Software [15.867607171943698]
We propose a two-phase framework with end-to-end correlation learning for better-tracing security patches.
PatchFinder achieves a Recall@10 of 80.63% and a Mean Reciprocal Rank (MRR) of 0.7951.
When applying PatchFinder in practice, we initially identified 533 patch commits and submitted them to the official, 482 of which have been confirmed by CVE Numbering Authorities.
arXiv Detail & Related papers (2024-07-24T07:46:24Z) - Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses.
Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives.
The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z) - On the Security Blind Spots of Software Composition Analysis [46.1389163921338]
We present a novel approach to detect vulnerable clones in the Maven repository.
We retrieve over 53k potential vulnerable clones from Maven Central.
We detect 727 confirmed vulnerable clones and synthesize a testable proof-of-vulnerability project for each of those.
arXiv Detail & Related papers (2023-06-08T20:14:46Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.