Related papers: Repository-Level Graph Representation Learning for Enhanced Security Patch Detection

Repository-Level Graph Representation Learning for Enhanced Security Patch Detection

URL: http://arxiv.org/abs/2412.08068v1
Date: Wed, 11 Dec 2024 03:29:56 GMT
Title: Repository-Level Graph Representation Learning for Enhanced Security Patch Detection
Authors: Xin-Cheng Wen, Zirui Lin, Cuiyun Gao, Hongyu Zhang, Yong Wang, Qing Liao,
Abstract summary: This paper proposes a Repository-level Security Patch Detection framework named RepoSPD.<n>RepoSPD comprises three key components: 1) a repository-level graph construction, RepoCPG, which represents software patches by merging pre-patch and post-patch source code at the repository level; 2) a structure-aware patch representation, which fuses the graph and sequence branch and aims at comprehending the relationship among multiple code changes; and 3) progressive learning, which facilitates the model in balancing semantic and structural information.
Score: 22.039868029497942
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Software vendors often silently release security patches without providing sufficient advisories (e.g., Common Vulnerabilities and Exposures) or delayed updates via resources (e.g., National Vulnerability Database). Therefore, it has become crucial to detect these security patches to ensure secure software maintenance. However, existing methods face the following challenges: (1) They primarily focus on the information within the patches themselves, overlooking the complex dependencies in the repository. (2) Security patches typically involve multiple functions and files, increasing the difficulty in well learning the representations. To alleviate the above challenges, this paper proposes a Repository-level Security Patch Detection framework named RepoSPD, which comprises three key components: 1) a repository-level graph construction, RepoCPG, which represents software patches by merging pre-patch and post-patch source code at the repository level; 2) a structure-aware patch representation, which fuses the graph and sequence branch and aims at comprehending the relationship among multiple code changes; 3) progressive learning, which facilitates the model in balancing semantic and structural information. To evaluate RepoSPD, we employ two widely-used datasets in security patch detection: SPI-DB and PatchDB. We further extend these datasets to the repository level, incorporating a total of 20,238 and 28,781 versions of repository in C/C++ programming languages, respectively, denoted as SPI-DB* and PatchDB*. We compare RepoSPD with six existing security patch detection methods and five static tools. Our experimental results demonstrate that RepoSPD outperforms the state-of-the-art baseline, with improvements of 11.90%, and 3.10% in terms of accuracy on the two datasets, respectively.

Related papers

Improving the Context Length and Efficiency of Code Retrieval for Tracing Security Vulnerability Fixes [1.3606495556399092]
A critical task in vulnerability management is tracing the patches that fix a vulnerability. Previous work has shown that the patch information is often missing in vulnerability databases. We propose SITPatchTracer, a scalable full-repo full-context retrieval system.
arXiv Detail & Related papers (2025-03-29T01:53:07Z)
MES-RAG: Bringing Multi-modal, Entity-Storage, and Secure Enhancements to RAG [65.0423152595537]
We propose MES-RAG, which enhances entity-specific query handling and provides accurate, secure, and consistent responses. MES-RAG introduces proactive security measures that ensure system integrity by applying protections prior to data access. Experimental results demonstrate that MES-RAG significantly improves both accuracy and recall, highlighting its effectiveness in advancing the security and utility of question-answering.
arXiv Detail & Related papers (2025-03-17T08:09:42Z)
Learning Graph-based Patch Representations for Identifying and Assessing Silent Vulnerability Fixes [5.983725940750908]
Software projects are dependent on many third-party libraries, therefore high-risk vulnerabilities can propagate through the dependency chain to downstream projects. Silent vulnerability fixes cause downstream software to be unaware of urgent security issues in a timely manner, posing a security risk to the software. We propose GRAPE, a GRAph-based Patch rEpresentation that aims to provide a unified framework for getting vulnerability fix patches representation.
arXiv Detail & Related papers (2024-09-13T03:23:11Z)
PriRoAgg: Achieving Robust Model Aggregation with Minimum Privacy Leakage for Federated Learning [49.916365792036636]
Federated learning (FL) has recently gained significant momentum due to its potential to leverage large-scale distributed user data. The transmitted model updates can potentially leak sensitive user information, and the lack of central control of the local training process leaves the global model susceptible to malicious manipulations on model updates. We develop a general framework PriRoAgg, utilizing Lagrange coded computing and distributed zero-knowledge proof, to execute a wide range of robust aggregation algorithms while satisfying aggregated privacy.
arXiv Detail & Related papers (2024-07-12T03:18:08Z)
How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE) We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories. To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z)
ReposVul: A Repository-Level High-Quality Vulnerability Dataset [13.90550557801464]
We propose an automated data collection framework and construct the first repository-level high-quality vulnerability dataset named ReposVul. The proposed framework mainly contains three modules: (1) A vulnerability untangling module, aiming at distinguishing vulnerability-fixing related code changes from tangled patches, in which the Large Language Models (LLMs) and static analysis tools are jointly employed, (2) A multi-granularity dependency extraction module, aiming at capturing the inter-procedural call relationships of vulnerabilities, in which we construct multiple-granularity information for each vulnerability patch, including repository-level, file-level, function-level
arXiv Detail & Related papers (2024-01-24T01:27:48Z)
Just-in-Time Detection of Silent Security Patches [7.840762542485285]
Security patches can be em silent, i.e., they do not always come with comprehensive advisories such as CVEs. This lack of transparency leaves users oblivious to available security updates, providing ample opportunity for attackers to exploit unpatched vulnerabilities. We propose to leverage large language models (LLMs) to augment patch information with generated code change explanations.
arXiv Detail & Related papers (2023-12-02T22:53:26Z)
MS-Former: Memory-Supported Transformer for Weakly Supervised Change Detection with Patch-Level Annotations [50.79913333804232]
We propose a memory-supported transformer (MS-Former) for weakly supervised change detection. MS-Former consists of a bi-directional attention block (BAB) and a patch-level supervision scheme (PSS) Experimental results on three benchmark datasets demonstrate the effectiveness of our proposed method in the change detection task.
arXiv Detail & Related papers (2023-11-16T09:57:29Z)
Multilevel Semantic Embedding of Software Patches: A Fine-to-Coarse Grained Approach Towards Security Patch Detection [6.838615442552715]
We introduce a multilevel Semantic Embedder for security patch detection, termed MultiSEM. This model harnesses word-centric vectors at a fine-grained level, emphasizing the significance of individual words. We further enrich this representation by assimilating patch descriptions to obtain a holistic semantic portrait.
arXiv Detail & Related papers (2023-08-29T11:41:21Z)
DeepfakeBench: A Comprehensive Benchmark of Deepfake Detection [55.70982767084996]
A critical yet frequently overlooked challenge in the field of deepfake detection is the lack of a standardized, unified, comprehensive benchmark. We present the first comprehensive benchmark for deepfake detection, called DeepfakeBench, which offers three key contributions. DeepfakeBench contains 15 state-of-the-art detection methods, 9CL datasets, a series of deepfake detection evaluation protocols and analysis tools, as well as comprehensive evaluations.
arXiv Detail & Related papers (2023-07-04T01:34:41Z)
Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation [103.90033029330527]
Few-Shot Instance (FSIS) requires detecting and segmenting novel classes with limited support examples. We introduce a unified framework, Reference Twice (RefT), to exploit the relationship between support and query features for FSIS.
arXiv Detail & Related papers (2023-01-03T15:33:48Z)
Segment and Complete: Defending Object Detectors against Adversarial Patch Attacks with Robust Patch Detection [142.24869736769432]
Adversarial patch attacks pose a serious threat to state-of-the-art object detectors. We propose Segment and Complete defense (SAC), a framework for defending object detectors against patch attacks. We show SAC can significantly reduce the targeted attack success rate of physical patch attacks.
arXiv Detail & Related papers (2021-12-08T19:18:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.