Just-in-Time Security Patch Detection -- LLM At the Rescue for Data
Augmentation
- URL: http://arxiv.org/abs/2312.01241v2
- Date: Tue, 12 Dec 2023 22:54:55 GMT
- Title: Just-in-Time Security Patch Detection -- LLM At the Rescue for Data
Augmentation
- Authors: Xunzhu Tang and Zhenghan Chen and Kisub Kim and Haoye Tian and Saad
Ezzini and Jacques Klein
- Abstract summary: We introduce a novel security patch detection system, LLMDA, which capitalizes on Large Language Models (LLMs) and code-text alignment methodologies.
Within LLMDA, we utilize labeled instructions to direct our LLMDA, differentiating patches based on security relevance.
We then use a PTFormer to merge patches with code, formulating hybrid attributes that encompass both the innate details and the interconnections between the patches and the code.
- Score: 8.308196041232128
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the face of growing vulnerabilities found in open-source software, the
need to identify {discreet} security patches has become paramount. The lack of
consistency in how software providers handle maintenance often leads to the
release of security patches without comprehensive advisories, leaving users
vulnerable to unaddressed security risks. To address this pressing issue, we
introduce a novel security patch detection system, LLMDA, which capitalizes on
Large Language Models (LLMs) and code-text alignment methodologies for patch
review, data enhancement, and feature combination. Within LLMDA, we initially
utilize LLMs for examining patches and expanding data of PatchDB and SPI-DB,
two security patch datasets from recent literature. We then use labeled
instructions to direct our LLMDA, differentiating patches based on security
relevance. Following this, we apply a PTFormer to merge patches with code,
formulating hybrid attributes that encompass both the innate details and the
interconnections between the patches and the code. This distinctive combination
method allows our system to capture more insights from the combined context of
patches and code, hence improving detection precision. Finally, we devise a
probabilistic batch contrastive learning mechanism within batches to augment
the capability of the our LLMDA in discerning security patches. The results
reveal that LLMDA significantly surpasses the start of the art techniques in
detecting security patches, underscoring its promise in fortifying software
maintenance.
Related papers
- Fixing Security Vulnerabilities with AI in OSS-Fuzz [9.730566646484304]
OSS-Fuzz is the most significant and widely used infrastructure for continuous validation of open source systems.
We customise the well-known AutoCodeRover agent for fixing security vulnerabilities.
Our experience with OSS-Fuzz vulnerability data shows that LLM agent autonomy is useful for successful security patching.
arXiv Detail & Related papers (2024-11-03T16:20:32Z) - LLM-Enhanced Software Patch Localization [24.1593187492973]
Security patch localization (SPL) recommendation methods are leading approaches to address this.
We introduce LLM-SPL, a recommendation-based SPL approach that leverages the capabilities of the Large Language Model (LLM) to locate the security patch commit for a given CVE.
Our evaluation on a dataset of 1,915 CVEs associated with 2,461 patches demonstrates that LLM-SPL excels in ranking patch commits, surpassing the state-of-the-art method in terms of Recall.
arXiv Detail & Related papers (2024-09-10T18:52:40Z) - HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data [60.75578581719921]
Large language models (LLMs) have shown great potential for automatic code generation.
Recent studies highlight that many LLM-generated code contains serious security vulnerabilities.
We introduce HexaCoder, a novel approach to enhance the ability of LLMs to generate secure codes.
arXiv Detail & Related papers (2024-09-10T12:01:43Z) - The Impact of SBOM Generators on Vulnerability Assessment in Python: A Comparison and a Novel Approach [56.4040698609393]
Software Bill of Materials (SBOM) has been promoted as a tool to increase transparency and verifiability in software composition.
Current SBOM generation tools often suffer from inaccuracies in identifying components and dependencies.
We propose PIP-sbom, a novel pip-inspired solution that addresses their shortcomings.
arXiv Detail & Related papers (2024-09-10T10:12:37Z) - Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses.
Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives.
The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z) - Towards Comprehensive and Efficient Post Safety Alignment of Large Language Models via Safety Patching [77.36097118561057]
textscSafePatching is a novel framework for comprehensive and efficient PSA.
textscSafePatching achieves a more comprehensive and efficient PSA than baseline methods.
arXiv Detail & Related papers (2024-05-22T16:51:07Z) - ReposVul: A Repository-Level High-Quality Vulnerability Dataset [13.90550557801464]
We propose an automated data collection framework and construct the first repository-level high-quality vulnerability dataset named ReposVul.
The proposed framework mainly contains three modules: (1) A vulnerability untangling module, aiming at distinguishing vulnerability-fixing related code changes from tangled patches, in which the Large Language Models (LLMs) and static analysis tools are jointly employed, (2) A multi-granularity dependency extraction module, aiming at capturing the inter-procedural call relationships of vulnerabilities, in which we construct multiple-granularity information for each vulnerability patch, including repository-level, file-level, function-level
arXiv Detail & Related papers (2024-01-24T01:27:48Z) - Fake Alignment: Are LLMs Really Aligned Well? [91.26543768665778]
This study investigates the substantial discrepancy in performance between multiple-choice questions and open-ended questions.
Inspired by research on jailbreak attack patterns, we argue this is caused by mismatched generalization.
arXiv Detail & Related papers (2023-11-10T08:01:23Z) - CompVPD: Iteratively Identifying Vulnerability Patches Based on Human Validation Results with a Precise Context [16.69634193308039]
It is challenging to apply security patches in open source software timely because notifications of patches are often incomplete and delayed.
We propose a multi-granularity slicing algorithm and an adaptive-expanding algorithm to accurately identify code related to the patches.
We empirically compare CompVPD with four state-of-the-art/practice (SOTA) approaches in identifying vulnerability patches.
arXiv Detail & Related papers (2023-10-04T02:08:18Z) - Multilevel Semantic Embedding of Software Patches: A Fine-to-Coarse
Grained Approach Towards Security Patch Detection [6.838615442552715]
We introduce a multilevel Semantic Embedder for security patch detection, termed MultiSEM.
This model harnesses word-centric vectors at a fine-grained level, emphasizing the significance of individual words.
We further enrich this representation by assimilating patch descriptions to obtain a holistic semantic portrait.
arXiv Detail & Related papers (2023-08-29T11:41:21Z) - Not what you've signed up for: Compromising Real-World LLM-Integrated
Applications with Indirect Prompt Injection [64.67495502772866]
Large Language Models (LLMs) are increasingly being integrated into various applications.
We show how attackers can override original instructions and employed controls using Prompt Injection attacks.
We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
arXiv Detail & Related papers (2023-02-23T17:14:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.