Related papers: Automated Mapping of Vulnerability Advisories onto their Fix Commits in Open Source Repositories

Automated Mapping of Vulnerability Advisories onto their Fix Commits in Open Source Repositories

URL: http://arxiv.org/abs/2103.13375v2
Date: Wed, 10 May 2023 06:47:20 GMT
Title: Automated Mapping of Vulnerability Advisories onto their Fix Commits in Open Source Repositories
Authors: Daan Hommersom, Antonino Sabetta, Bonaventura Coppola, Dario Di Nucci, Damian A. Tamburri
Abstract summary: We present an approach that combines practical experience and machine-learning (ML) An advisory record containing key information about a vulnerability is extracted from an advisory. A subset of candidate fix commits is obtained from the source code repository of the affected project.
Score: 7.629717457706326
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The lack of comprehensive sources of accurate vulnerability data represents a critical obstacle to studying and understanding software vulnerabilities (and their corrections). In this paper, we present an approach that combines heuristics stemming from practical experience and machine-learning (ML) - specifically, natural language processing (NLP) - to address this problem. Our method consists of three phases. First, an advisory record containing key information about a vulnerability is extracted from an advisory (expressed in natural language). Second, using heuristics, a subset of candidate fix commits is obtained from the source code repository of the affected project by filtering out commits that are known to be irrelevant for the task at hand. Finally, for each such candidate commit, our method builds a numerical feature vector reflecting the characteristics of the commit that are relevant to predicting its match with the advisory at hand. The feature vectors are then exploited for building a final ranked list of candidate fixing commits. The score attributed by the ML model to each feature is kept visible to the users, allowing them to interpret the predictions. We evaluated our approach using a prototype implementation named FixFinder on a manually curated data set that comprises 2,391 known fix commits corresponding to 1,248 public vulnerability advisories. When considering the top-10 commits in the ranked results, our implementation could successfully identify at least one fix commit for up to 84.03% of the vulnerabilities (with a fix commit on the first position for 65.06% of the vulnerabilities). In conclusion, our method reduces considerably the effort needed to search OSS repositories for the commits that fix known vulnerabilities.

Related papers

EXPLICATE: Enhancing Phishing Detection through Explainable AI and LLM-Powered Interpretability [44.2907457629342]
EXPLICATE is a framework that enhances phishing detection through a three-component architecture. It is on par with existing deep learning techniques but has better explainability. It addresses the critical divide between automated AI and user trust in phishing detection systems.
arXiv Detail & Related papers (2025-03-22T23:37:35Z)
Lie Detector: Unified Backdoor Detection via Cross-Examination Framework [68.45399098884364]
We propose a unified backdoor detection framework in the semi-honest setting. Our method achieves superior detection performance, improving accuracy by 5.4%, 1.6%, and 11.9% over SoTA baselines. Notably, it is the first to effectively detect backdoors in multimodal large language models.
arXiv Detail & Related papers (2025-03-21T06:12:06Z)
CommitShield: Tracking Vulnerability Introduction and Fix in Version Control Systems [15.037460085046806]
CommitShield is a tool for detecting vulnerabilities in code commits. It combines the code analysis capabilities of static analysis tools with the natural language and code understanding capabilities of large language models. We show that CommitShield improves recall by 76%-87% over state-of-the-art methods in the vulnerability fix detection task.
arXiv Detail & Related papers (2025-01-07T08:52:55Z)
Closing the Gap: A User Study on the Real-world Usefulness of AI-powered Vulnerability Detection & Repair in the IDE [5.824774194964031]
We implement a vulnerability detection and fix tool with professional software developers on real projects that they own. DeepVulGuard scans code for vulnerabilities, suggests fixes, provides natural-language explanations for alerts and fixes, leveraging chat interfaces. We find that although state-of-the-art AI-powered detection and fix tools show promise, they are not yet practical for real-world use due to a high rate of false positives and non-applicable fixes.
arXiv Detail & Related papers (2024-12-18T20:19:56Z)
Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses. Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives. The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z)
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors [64.9938658716425]
Existing evaluations of large language models' (LLMs) ability to recognize and reject unsafe user requests face three limitations. First, existing methods often use coarse-grained of unsafe topics, and are over-representing some fine-grained topics. Second, linguistic characteristics and formatting of prompts are often overlooked, like different languages, dialects, and more -- which are only implicitly considered in many evaluations. Third, existing evaluations rely on large LLMs for evaluation, which can be expensive.
arXiv Detail & Related papers (2024-06-20T17:56:07Z)
How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE) We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories. To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z)
Bridging the Gap Between End-to-End and Two-Step Text Spotting [88.14552991115207]
Bridging Text Spotting is a novel approach that resolves the error accumulation and suboptimal performance issues in two-step methods. We demonstrate the effectiveness of the proposed method through extensive experiments.
arXiv Detail & Related papers (2024-04-06T13:14:04Z)
CompVPD: Iteratively Identifying Vulnerability Patches Based on Human Validation Results with a Precise Context [16.69634193308039]
It is challenging to apply security patches in open source software timely because notifications of patches are often incomplete and delayed. We propose a multi-granularity slicing algorithm and an adaptive-expanding algorithm to accurately identify code related to the patches. We empirically compare CompVPD with four state-of-the-art/practice (SOTA) approaches in identifying vulnerability patches.
arXiv Detail & Related papers (2023-10-04T02:08:18Z)
VFFINDER: A Graph-based Approach for Automated Silent Vulnerability-Fix Identification [4.837912059099674]
VFFINDER is a graph-based approach for automated silent vulnerability fix identification. It distinguishes vulnerability-fixing commits from non-fixing ones using attention-based graph neural network models. Our results show that VFFINDER significantly improves the state-of-the-art methods by 39-83% in Precision, 19-148% in Recall, and 30-109% in F1.
arXiv Detail & Related papers (2023-09-05T05:55:18Z)
On the Security Blind Spots of Software Composition Analysis [46.1389163921338]
We present a novel approach to detect vulnerable clones in the Maven repository. We retrieve over 53k potential vulnerable clones from Maven Central. We detect 727 confirmed vulnerable clones and synthesize a testable proof-of-vulnerability project for each of those.
arXiv Detail & Related papers (2023-06-08T20:14:46Z)
Multi-Granularity Detector for Vulnerability Fixes [13.653249890867222]
We propose MiDas (Multi-Granularity Detector for Vulnerability Fixes) to identify vulnerability-fixing commits. MiDas constructs different neural networks for each level of code change granularity, corresponding to commit-level, file-level, hunk-level, and line-level. MiDas outperforms the current state-of-the-art baseline in terms of AUC by 4.9% and 13.7% on Java and Python-based datasets.
arXiv Detail & Related papers (2023-05-23T10:06:28Z)
VulCurator: A Vulnerability-Fixing Commit Detector [8.32137934421055]
VulCurator is a tool that leverages deep learning on richer sources of information. VulCurator outperforms the state-of-the-art baselines up to 16.1% in terms of F1-score.
arXiv Detail & Related papers (2022-09-07T16:11:31Z)
VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code. Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph. VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z)
Learning Stable Classifiers by Transferring Unstable Features [59.06169363181417]
We study transfer learning in the presence of spurious correlations. We experimentally demonstrate that directly transferring the stable feature extractor learned on the source task may not eliminate these biases for the target task. We hypothesize that the unstable features in the source task and those in the target task are directly related.
arXiv Detail & Related papers (2021-06-15T02:41:12Z)
Detecting Security Fixes in Open-Source Repositories using Static Code Analyzers [8.716427214870459]
We study the extent to which the output of off-the-shelf static code analyzers can be used as a source of features to represent commits in Machine Learning (ML) applications. We investigate how such features can be used to construct embeddings and train ML models to automatically identify source code commits that contain vulnerability fixes. We find that the combination of our method with commit2vec represents a tangible improvement over the state of the art in the automatic identification of commits that fix vulnerabilities.
arXiv Detail & Related papers (2021-05-07T15:57:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.