Just-in-Time Security Patch Detection -- LLM At the Rescue for Data
Augmentation
- URL: http://arxiv.org/abs/2312.01241v2
- Date: Tue, 12 Dec 2023 22:54:55 GMT
- Title: Just-in-Time Security Patch Detection -- LLM At the Rescue for Data
Augmentation
- Authors: Xunzhu Tang and Zhenghan Chen and Kisub Kim and Haoye Tian and Saad
Ezzini and Jacques Klein
- Abstract summary: We introduce a novel security patch detection system, LLMDA, which capitalizes on Large Language Models (LLMs) and code-text alignment methodologies.
Within LLMDA, we utilize labeled instructions to direct our LLMDA, differentiating patches based on security relevance.
We then use a PTFormer to merge patches with code, formulating hybrid attributes that encompass both the innate details and the interconnections between the patches and the code.
- Score: 8.308196041232128
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the face of growing vulnerabilities found in open-source software, the
need to identify {discreet} security patches has become paramount. The lack of
consistency in how software providers handle maintenance often leads to the
release of security patches without comprehensive advisories, leaving users
vulnerable to unaddressed security risks. To address this pressing issue, we
introduce a novel security patch detection system, LLMDA, which capitalizes on
Large Language Models (LLMs) and code-text alignment methodologies for patch
review, data enhancement, and feature combination. Within LLMDA, we initially
utilize LLMs for examining patches and expanding data of PatchDB and SPI-DB,
two security patch datasets from recent literature. We then use labeled
instructions to direct our LLMDA, differentiating patches based on security
relevance. Following this, we apply a PTFormer to merge patches with code,
formulating hybrid attributes that encompass both the innate details and the
interconnections between the patches and the code. This distinctive combination
method allows our system to capture more insights from the combined context of
patches and code, hence improving detection precision. Finally, we devise a
probabilistic batch contrastive learning mechanism within batches to augment
the capability of the our LLMDA in discerning security patches. The results
reveal that LLMDA significantly surpasses the start of the art techniques in
detecting security patches, underscoring its promise in fortifying software
maintenance.
Related papers
- Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses.
Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives.
The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z) - A Case Study of LLM for Automated Vulnerability Repair: Assessing Impact of Reasoning and Patch Validation Feedback [7.742213291781287]
We present VRpilot, a vulnerability repair technique based on reasoning and patch validation feedback.
Our results show that VRpilot generates, on average, 14% and 7.6% more correct patches than the baseline techniques on C and Java.
arXiv Detail & Related papers (2024-05-24T16:29:48Z) - Towards Comprehensive and Efficient Post Safety Alignment of Large Language Models via Safety Patching [77.36097118561057]
textscSafePatching is a novel framework for comprehensive and efficient PSA.
textscSafePatching achieves a more comprehensive and efficient PSA than baseline methods.
arXiv Detail & Related papers (2024-05-22T16:51:07Z) - ReposVul: A Repository-Level High-Quality Vulnerability Dataset [13.90550557801464]
We propose an automated data collection framework and construct the first repository-level high-quality vulnerability dataset named ReposVul.
The proposed framework mainly contains three modules: (1) A vulnerability untangling module, aiming at distinguishing vulnerability-fixing related code changes from tangled patches, in which the Large Language Models (LLMs) and static analysis tools are jointly employed, (2) A multi-granularity dependency extraction module, aiming at capturing the inter-procedural call relationships of vulnerabilities, in which we construct multiple-granularity information for each vulnerability patch, including repository-level, file-level, function-level
arXiv Detail & Related papers (2024-01-24T01:27:48Z) - Can LLMs Patch Security Issues? [1.3299507495084417]
Large Language Models (LLMs) have shown impressive proficiency in code generation.
LLMs share a weakness with their human counterparts: producing code that inadvertently has security vulnerabilities.
We propose Feedback-Driven Security Patching (FDSP), where LLMs automatically refine generated, vulnerable code.
arXiv Detail & Related papers (2023-11-13T08:54:37Z) - Fake Alignment: Are LLMs Really Aligned Well? [91.26543768665778]
This study investigates the substantial discrepancy in performance between multiple-choice questions and open-ended questions.
Inspired by research on jailbreak attack patterns, we argue this is caused by mismatched generalization.
arXiv Detail & Related papers (2023-11-10T08:01:23Z) - CompVPD: Iteratively Identifying Vulnerability Patches Based on Human Validation Results with a Precise Context [16.69634193308039]
It is challenging to apply security patches in open source software timely because notifications of patches are often incomplete and delayed.
We propose a multi-granularity slicing algorithm and an adaptive-expanding algorithm to accurately identify code related to the patches.
We empirically compare CompVPD with four state-of-the-art/practice (SOTA) approaches in identifying vulnerability patches.
arXiv Detail & Related papers (2023-10-04T02:08:18Z) - Multilevel Semantic Embedding of Software Patches: A Fine-to-Coarse
Grained Approach Towards Security Patch Detection [6.838615442552715]
We introduce a multilevel Semantic Embedder for security patch detection, termed MultiSEM.
This model harnesses word-centric vectors at a fine-grained level, emphasizing the significance of individual words.
We further enrich this representation by assimilating patch descriptions to obtain a holistic semantic portrait.
arXiv Detail & Related papers (2023-08-29T11:41:21Z) - Not what you've signed up for: Compromising Real-World LLM-Integrated
Applications with Indirect Prompt Injection [64.67495502772866]
Large Language Models (LLMs) are increasingly being integrated into various applications.
We show how attackers can override original instructions and employed controls using Prompt Injection attacks.
We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
arXiv Detail & Related papers (2023-02-23T17:14:38Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z) - Segment and Complete: Defending Object Detectors against Adversarial
Patch Attacks with Robust Patch Detection [142.24869736769432]
Adversarial patch attacks pose a serious threat to state-of-the-art object detectors.
We propose Segment and Complete defense (SAC), a framework for defending object detectors against patch attacks.
We show SAC can significantly reduce the targeted attack success rate of physical patch attacks.
arXiv Detail & Related papers (2021-12-08T19:18:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.