Related papers: NAVRepair: Node-type Aware C/C++ Code Vulnerability Repair

NAVRepair: Node-type Aware C/C++ Code Vulnerability Repair

URL: http://arxiv.org/abs/2405.04994v1
Date: Wed, 8 May 2024 11:58:55 GMT
Title: NAVRepair: Node-type Aware C/C++ Code Vulnerability Repair
Authors: Ruoke Wang, Zongjie Li, Chaozheng Wang, Yang Xiao, Cuiyun Gao,
Abstract summary: NAVRepair is a novel framework that combines the node-type information extracted fromASTs with error types, specifically targeting C/C++ vulnerabilities. We achieve a 26% higher accuracy compared to an existing LLM-based C/C++ vulnerability repair method.
Score: 14.152755184229374
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid advancement of deep learning has led to the development of Large Language Models (LLMs). In the field of vulnerability repair, previous research has leveraged rule-based fixing, pre-trained models, and LLM's prompt engineering. However, existing approaches have limitations in terms of the integration of code structure with error types. Besides, due to certain features of C/C++ language, vulnerability repair in C/C++ proves to be exceptionally challenging. To address these challenges, we propose NAVRepair, a novel framework that combines the node-type information extracted from Abstract Syntax Trees (ASTs) with error types, specifically targeting C/C++ vulnerabilities. Specifically, our approach employs type analysis to localize the minimum edit node (MEN) and customizes context information collection based on different error types. In the offline stage, NAVRepair parses code patches to locate MENs and designs rules to extract relevant contextual information for each MEN type. In the online repairing stage, it analyzes the suspicious code, combines it with vulnerability type templates derived from the Common Weakness Enumeration (CWE), and generates targeted repair prompts. We evaluate NAVRepair on multiple popular LLMs and demonstrate its effectiveness in improving the performance of code vulnerability repair. Notably, our framework is independent of any specific LLMs and can quickly adapt to new vulnerability types. Extensive experiments validate that NAVRepair achieves excellent results in assisting LLMs to accurately detect and fix C/C++ vulnerabilities. We achieve a 26% higher accuracy compared to an existing LLM-based C/C++ vulnerability repair method. We believe our node type-aware approach has promising application prospects for enhancing real-world C/C++ code security.

Related papers

Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs [84.30534714651093]
We present an innovative APR tool for Dafny, a verification-aware programming language.<n>We localize faults through a series of steps, which include using Hoare Logic to determine the state of each statement within the program.<n>We evaluate our approach using DafnyBench, a benchmark of real-world Dafny programs.
arXiv Detail & Related papers (2025-07-04T15:36:12Z)
D-LiFT: Improving LLM-based Decompiler Backend via Code Quality-driven Fine-tuning [49.16469288280772]
We present D-LiFT, an automated decompiler backend that harnesses and trains LLMs to improve the quality of decompiled code via reinforcement learning (RL)<n>D-LiFT adheres to a key principle for enhancing the quality of decompiled code: textitpreserving accuracy while improving readability.<n>Central to D-LiFT, we propose D-SCORE, an integrated quality assessment system to score the decompiled code from multiple aspects.
arXiv Detail & Related papers (2025-06-11T19:09:08Z)
OMNIGUARD: An Efficient Approach for AI Safety Moderation Across Modalities [54.152681077418805]
Current detection approaches are fallible, and are particularly susceptible to attacks that exploit mismatched generalizations of model capabilities.<n>We propose OMNIGUARD, an approach for detecting harmful prompts across languages and modalities.<n>Our approach improves harmful prompt classification accuracy by 11.57% over the strongest baseline in a multilingual setting.
arXiv Detail & Related papers (2025-05-29T05:25:27Z)
ReF Decompile: Relabeling and Function Call Enhanced Decompile [50.86228893636785]
The goal of decompilation is to convert compiled low-level code (e.g., assembly code) back into high-level programming languages. This task supports various reverse engineering applications, such as vulnerability identification, malware analysis, and legacy software migration.
arXiv Detail & Related papers (2025-02-17T12:38:57Z)
CommitShield: Tracking Vulnerability Introduction and Fix in Version Control Systems [15.037460085046806]
CommitShield is a tool for detecting vulnerabilities in code commits. It combines the code analysis capabilities of static analysis tools with the natural language and code understanding capabilities of large language models. We show that CommitShield improves recall by 76%-87% over state-of-the-art methods in the vulnerability fix detection task.
arXiv Detail & Related papers (2025-01-07T08:52:55Z)
Learning Graph-based Patch Representations for Identifying and Assessing Silent Vulnerability Fixes [5.983725940750908]
Software projects are dependent on many third-party libraries, therefore high-risk vulnerabilities can propagate through the dependency chain to downstream projects. Silent vulnerability fixes cause downstream software to be unaware of urgent security issues in a timely manner, posing a security risk to the software. We propose GRAPE, a GRAph-based Patch rEpresentation that aims to provide a unified framework for getting vulnerability fix patches representation.
arXiv Detail & Related papers (2024-09-13T03:23:11Z)
Towards Effectively Detecting and Explaining Vulnerabilities Using Large Language Models [17.96542494363619]
Large language models (LLMs) have demonstrated remarkable capabilities in comprehending complex contexts. In this paper, we conduct a study to investigate the capabilities of LLMs in both detecting and explaining vulnerabilities. Under specialized fine-tuning for vulnerability explanation, our LLMVulExp not only detects the types of vulnerabilities in the code but also analyzes the code context to generate the cause, location, and repair suggestions.
arXiv Detail & Related papers (2024-06-14T04:01:25Z)
M2CVD: Enhancing Vulnerability Semantic through Multi-Model Collaboration for Code Vulnerability Detection [52.4455893010468]
Large Language Models (LLMs) have strong capabilities in code comprehension, but fine-tuning costs and semantic alignment issues limit their project-specific optimization. Code models such CodeBERT are easy to fine-tune, but it is often difficult to learn vulnerability semantics from complex code languages. This paper introduces the Multi-Model Collaborative Vulnerability Detection approach (M2CVD) to improve the detection accuracy of code models.
arXiv Detail & Related papers (2024-06-10T00:05:49Z)
DLAP: A Deep Learning Augmented Large Language Model Prompting Framework for Software Vulnerability Detection [12.686480870065827]
This paper contributes textbfDLAP, a framework that combines the best of both deep learning (DL) models and Large Language Models (LLMs) to achieve exceptional vulnerability detection performance. Experiment results confirm that DLAP outperforms state-of-the-art prompting frameworks, including role-based prompts, auxiliary information prompts, chain-of-thought prompts, and in-context learning prompts.
arXiv Detail & Related papers (2024-05-02T11:44:52Z)
AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting [54.931241667414184]
We propose textbfAdaptive textbfShield Prompting, which prepends inputs with defense prompts to defend MLLMs against structure-based jailbreak attacks. Our methods can consistently improve MLLMs' robustness against structure-based jailbreak attacks.
arXiv Detail & Related papers (2024-03-14T15:57:13Z)
CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion [117.178835165855]
This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs. Our studies reveal a new and universal safety vulnerability of these models against code input. We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization.
arXiv Detail & Related papers (2024-03-12T17:55:38Z)
DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers [74.7446827091938]
We introduce an automatic prompt textbfDecomposition and textbfReconstruction framework for jailbreak textbfAttack (DrAttack) DrAttack includes three key components: (a) Decomposition' of the original prompt into sub-prompts, (b) Reconstruction' of these sub-prompts implicitly by in-context learning with semantically similar but harmless reassembling demo, and (c) a Synonym Search' of sub-prompts, aiming to find sub-prompts' synonyms that maintain the original intent while
arXiv Detail & Related papers (2024-02-25T17:43:29Z)
ZeroLeak: Using LLMs for Scalable and Cost Effective Side-Channel Patching [6.556868623811133]
Security critical software, e.g., OpenSSL, comes with numerous side-channel leakages left unpatched due to a lack of resources or experts. We explore the use of Large Language Models (LLMs) in generating patches for vulnerable code with microarchitectural side-channel leakages.
arXiv Detail & Related papers (2023-08-24T20:04:36Z)
CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks. Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities. This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z)
VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code. Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph. VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z)
Multi-context Attention Fusion Neural Network for Software Vulnerability Identification [4.05739885420409]
We propose a deep learning model that learns to detect some of the common categories of security vulnerabilities in source code efficiently. The model builds an accurate understanding of code semantics with a lot less learnable parameters. The proposed AI achieves 98.40% F1-score on specific CWEs from the benchmarked NIST SARD dataset.
arXiv Detail & Related papers (2021-04-19T11:50:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.