Related papers: Just-in-Time Detection of Silent Security Patches

Just-in-Time Detection of Silent Security Patches

URL: http://arxiv.org/abs/2312.01241v3
Date: Mon, 25 Nov 2024 18:21:16 GMT
Title: Just-in-Time Detection of Silent Security Patches
Authors: Xunzhu Tang, Zhenghan Chen, Kisub Kim, Haoye Tian, Saad Ezzini, Jacques Klein,
Abstract summary: Security patches can be em silent, i.e., they do not always come with comprehensive advisories such as CVEs. This lack of transparency leaves users oblivious to available security updates, providing ample opportunity for attackers to exploit unpatched vulnerabilities. We propose to leverage large language models (LLMs) to augment patch information with generated code change explanations.
Score: 7.840762542485285
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Open-source code is pervasive. In this setting, embedded vulnerabilities are spreading to downstream software at an alarming rate. While such vulnerabilities are generally identified and addressed rapidly, inconsistent maintenance policies may lead security patches to go unnoticed. Indeed, security patches can be {\em silent}, i.e., they do not always come with comprehensive advisories such as CVEs. This lack of transparency leaves users oblivious to available security updates, providing ample opportunity for attackers to exploit unpatched vulnerabilities. Consequently, identifying silent security patches just in time when they are released is essential for preventing n-day attacks, and for ensuring robust and secure maintenance practices. With LLMDA we propose to (1) leverage large language models (LLMs) to augment patch information with generated code change explanations, (2) design a representation learning approach that explores code-text alignment methodologies for feature combination, (3) implement a label-wise training with labelled instructions for guiding the embedding based on security relevance, and (4) rely on a probabilistic batch contrastive learning mechanism for building a high-precision identifier of security patches. We evaluate LLMDA on the PatchDB and SPI-DB literature datasets and show that our approach substantially improves over the state-of-the-art, notably GraphSPD by 20% in terms of F-Measure on the SPI-DB benchmark.

Related papers

Improving the Context Length and Efficiency of Code Retrieval for Tracing Security Vulnerability Fixes [1.3606495556399092]
A critical task in vulnerability management is tracing the patches that fix a vulnerability. Previous work has shown that the patch information is often missing in vulnerability databases. We propose SITPatchTracer, a scalable full-repo full-context retrieval system.
arXiv Detail & Related papers (2025-03-29T01:53:07Z)
Tit-for-Tat: Safeguarding Large Vision-Language Models Against Jailbreak Attacks via Adversarial Defense [90.71884758066042]
Large vision-language models (LVLMs) introduce a unique vulnerability: susceptibility to malicious attacks via visual inputs. We propose ESIII (Embedding Security Instructions Into Images), a novel methodology for transforming the visual space from a source of vulnerability into an active defense mechanism.
arXiv Detail & Related papers (2025-03-14T17:39:45Z)
Improving LLM Safety Alignment with Dual-Objective Optimization [65.41451412400609]
Existing training-time safety alignment techniques for large language models (LLMs) remain vulnerable to jailbreak attacks. We propose an improved safety alignment that disentangles DPO objectives into two components: (1) robust refusal training, which encourages refusal even when partial unsafe generations are produced, and (2) targeted unlearning of harmful knowledge.
arXiv Detail & Related papers (2025-03-05T18:01:05Z)
Repository-Level Graph Representation Learning for Enhanced Security Patch Detection [22.039868029497942]
This paper proposes a Repository-level Security Patch Detection framework named RepoSPD. RepoSPD comprises three key components: 1) a repository-level graph construction, RepoCPG, which represents software patches by merging pre-patch and post-patch source code at the repository level; 2) a structure-aware patch representation, which fuses the graph and sequence branch and aims at comprehending the relationship among multiple code changes; and 3) progressive learning, which facilitates the model in balancing semantic and structural information.
arXiv Detail & Related papers (2024-12-11T03:29:56Z)
Learning Graph-based Patch Representations for Identifying and Assessing Silent Vulnerability Fixes [5.983725940750908]
Software projects are dependent on many third-party libraries, therefore high-risk vulnerabilities can propagate through the dependency chain to downstream projects. Silent vulnerability fixes cause downstream software to be unaware of urgent security issues in a timely manner, posing a security risk to the software. We propose GRAPE, a GRAph-based Patch rEpresentation that aims to provide a unified framework for getting vulnerability fix patches representation.
arXiv Detail & Related papers (2024-09-13T03:23:11Z)
The Impact of SBOM Generators on Vulnerability Assessment in Python: A Comparison and a Novel Approach [56.4040698609393]
Software Bill of Materials (SBOM) has been promoted as a tool to increase transparency and verifiability in software composition. Current SBOM generation tools often suffer from inaccuracies in identifying components and dependencies. We propose PIP-sbom, a novel pip-inspired solution that addresses their shortcomings.
arXiv Detail & Related papers (2024-09-10T10:12:37Z)
Towards Comprehensive and Efficient Post Safety Alignment of Large Language Models via Safety Patching [77.36097118561057]
textscSafePatching is a novel framework for comprehensive and efficient PSA. textscSafePatching achieves a more comprehensive and efficient PSA than baseline methods.
arXiv Detail & Related papers (2024-05-22T16:51:07Z)
CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion [117.178835165855]
This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs. Our studies reveal a new and universal safety vulnerability of these models against code input. We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization.
arXiv Detail & Related papers (2024-03-12T17:55:38Z)
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications [69.13807233595455]
Large language models (LLMs) show inherent brittleness in their safety mechanisms. This study explores this brittleness of safety alignment by leveraging pruning and low-rank modifications. We show that LLMs remain vulnerable to low-cost fine-tuning attacks even when modifications to the safety-critical regions are restricted.
arXiv Detail & Related papers (2024-02-07T18:34:38Z)
ReposVul: A Repository-Level High-Quality Vulnerability Dataset [13.90550557801464]
We propose an automated data collection framework and construct the first repository-level high-quality vulnerability dataset named ReposVul. The proposed framework mainly contains three modules: (1) A vulnerability untangling module, aiming at distinguishing vulnerability-fixing related code changes from tangled patches, in which the Large Language Models (LLMs) and static analysis tools are jointly employed, (2) A multi-granularity dependency extraction module, aiming at capturing the inter-procedural call relationships of vulnerabilities, in which we construct multiple-granularity information for each vulnerability patch, including repository-level, file-level, function-level
arXiv Detail & Related papers (2024-01-24T01:27:48Z)
CompVPD: Iteratively Identifying Vulnerability Patches Based on Human Validation Results with a Precise Context [16.69634193308039]
It is challenging to apply security patches in open source software timely because notifications of patches are often incomplete and delayed. We propose a multi-granularity slicing algorithm and an adaptive-expanding algorithm to accurately identify code related to the patches. We empirically compare CompVPD with four state-of-the-art/practice (SOTA) approaches in identifying vulnerability patches.
arXiv Detail & Related papers (2023-10-04T02:08:18Z)
VFFINDER: A Graph-based Approach for Automated Silent Vulnerability-Fix Identification [4.837912059099674]
VFFINDER is a graph-based approach for automated silent vulnerability fix identification. It distinguishes vulnerability-fixing commits from non-fixing ones using attention-based graph neural network models. Our results show that VFFINDER significantly improves the state-of-the-art methods by 39-83% in Precision, 19-148% in Recall, and 30-109% in F1.
arXiv Detail & Related papers (2023-09-05T05:55:18Z)
Multilevel Semantic Embedding of Software Patches: A Fine-to-Coarse Grained Approach Towards Security Patch Detection [6.838615442552715]
We introduce a multilevel Semantic Embedder for security patch detection, termed MultiSEM. This model harnesses word-centric vectors at a fine-grained level, emphasizing the significance of individual words. We further enrich this representation by assimilating patch descriptions to obtain a holistic semantic portrait.
arXiv Detail & Related papers (2023-08-29T11:41:21Z)
Pre-trained Encoders in Self-Supervised Learning Improve Secure and Privacy-preserving Supervised Learning [63.45532264721498]
Self-supervised learning is an emerging technique to pre-train encoders using unlabeled data. We perform first systematic, principled measurement study to understand whether and when a pretrained encoder can address the limitations of secure or privacy-preserving supervised learning algorithms.
arXiv Detail & Related papers (2022-12-06T21:35:35Z)
VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code. Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph. VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.