Automated Mapping of Vulnerability Advisories onto their Fix Commits in
Open Source Repositories
- URL: http://arxiv.org/abs/2103.13375v2
- Date: Wed, 10 May 2023 06:47:20 GMT
- Title: Automated Mapping of Vulnerability Advisories onto their Fix Commits in
Open Source Repositories
- Authors: Daan Hommersom, Antonino Sabetta, Bonaventura Coppola, Dario Di Nucci,
Damian A. Tamburri
- Abstract summary: We present an approach that combines practical experience and machine-learning (ML)
An advisory record containing key information about a vulnerability is extracted from an advisory.
A subset of candidate fix commits is obtained from the source code repository of the affected project.
- Score: 7.629717457706326
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The lack of comprehensive sources of accurate vulnerability data represents a
critical obstacle to studying and understanding software vulnerabilities (and
their corrections). In this paper, we present an approach that combines
heuristics stemming from practical experience and machine-learning (ML) -
specifically, natural language processing (NLP) - to address this problem. Our
method consists of three phases. First, an advisory record containing key
information about a vulnerability is extracted from an advisory (expressed in
natural language). Second, using heuristics, a subset of candidate fix commits
is obtained from the source code repository of the affected project by
filtering out commits that are known to be irrelevant for the task at hand.
Finally, for each such candidate commit, our method builds a numerical feature
vector reflecting the characteristics of the commit that are relevant to
predicting its match with the advisory at hand. The feature vectors are then
exploited for building a final ranked list of candidate fixing commits. The
score attributed by the ML model to each feature is kept visible to the users,
allowing them to interpret the predictions.
We evaluated our approach using a prototype implementation named FixFinder on
a manually curated data set that comprises 2,391 known fix commits
corresponding to 1,248 public vulnerability advisories. When considering the
top-10 commits in the ranked results, our implementation could successfully
identify at least one fix commit for up to 84.03% of the vulnerabilities (with
a fix commit on the first position for 65.06% of the vulnerabilities). In
conclusion, our method reduces considerably the effort needed to search OSS
repositories for the commits that fix known vulnerabilities.
Related papers
- SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution [56.9361004704428]
Large Language Models (LLMs) have demonstrated remarkable proficiency across a variety of complex tasks.
SWE-Fixer is a novel open-source framework designed to effectively and efficiently resolve GitHub issues.
We assess our approach on the SWE-Bench Lite and Verified benchmarks, achieving state-of-the-art performance among open-source models.
arXiv Detail & Related papers (2025-01-09T07:54:24Z) - CommitShield: Tracking Vulnerability Introduction and Fix in Version Control Systems [15.037460085046806]
CommitShield is a tool for detecting vulnerabilities in code commits.
It combines the code analysis capabilities of static analysis tools with the natural language and code understanding capabilities of large language models.
We show that CommitShield improves recall by 76%-87% over state-of-the-art methods in the vulnerability fix detection task.
arXiv Detail & Related papers (2025-01-07T08:52:55Z) - Closing the Gap: A User Study on the Real-world Usefulness of AI-powered Vulnerability Detection & Repair in the IDE [5.824774194964031]
We implement a vulnerability detection and fix tool with professional software developers on real projects that they own.
DeepVulGuard scans code for vulnerabilities, suggests fixes, provides natural-language explanations for alerts and fixes, leveraging chat interfaces.
We find that although state-of-the-art AI-powered detection and fix tools show promise, they are not yet practical for real-world use due to a high rate of false positives and non-applicable fixes.
arXiv Detail & Related papers (2024-12-18T20:19:56Z) - Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses.
Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives.
The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z) - SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors [64.9938658716425]
Existing evaluations of large language models' (LLMs) ability to recognize and reject unsafe user requests face three limitations.
First, existing methods often use coarse-grained of unsafe topics, and are over-representing some fine-grained topics.
Second, linguistic characteristics and formatting of prompts are often overlooked, like different languages, dialects, and more -- which are only implicitly considered in many evaluations.
Third, existing evaluations rely on large LLMs for evaluation, which can be expensive.
arXiv Detail & Related papers (2024-06-20T17:56:07Z) - Bridging the Gap Between End-to-End and Two-Step Text Spotting [88.14552991115207]
Bridging Text Spotting is a novel approach that resolves the error accumulation and suboptimal performance issues in two-step methods.
We demonstrate the effectiveness of the proposed method through extensive experiments.
arXiv Detail & Related papers (2024-04-06T13:14:04Z) - VFFINDER: A Graph-based Approach for Automated Silent Vulnerability-Fix
Identification [4.837912059099674]
VFFINDER is a graph-based approach for automated silent vulnerability fix identification.
It distinguishes vulnerability-fixing commits from non-fixing ones using attention-based graph neural network models.
Our results show that VFFINDER significantly improves the state-of-the-art methods by 39-83% in Precision, 19-148% in Recall, and 30-109% in F1.
arXiv Detail & Related papers (2023-09-05T05:55:18Z) - Multi-Granularity Detector for Vulnerability Fixes [13.653249890867222]
We propose MiDas (Multi-Granularity Detector for Vulnerability Fixes) to identify vulnerability-fixing commits.
MiDas constructs different neural networks for each level of code change granularity, corresponding to commit-level, file-level, hunk-level, and line-level.
MiDas outperforms the current state-of-the-art baseline in terms of AUC by 4.9% and 13.7% on Java and Python-based datasets.
arXiv Detail & Related papers (2023-05-23T10:06:28Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z) - Learning Stable Classifiers by Transferring Unstable Features [59.06169363181417]
We study transfer learning in the presence of spurious correlations.
We experimentally demonstrate that directly transferring the stable feature extractor learned on the source task may not eliminate these biases for the target task.
We hypothesize that the unstable features in the source task and those in the target task are directly related.
arXiv Detail & Related papers (2021-06-15T02:41:12Z) - Detecting Security Fixes in Open-Source Repositories using Static Code
Analyzers [8.716427214870459]
We study the extent to which the output of off-the-shelf static code analyzers can be used as a source of features to represent commits in Machine Learning (ML) applications.
We investigate how such features can be used to construct embeddings and train ML models to automatically identify source code commits that contain vulnerability fixes.
We find that the combination of our method with commit2vec represents a tangible improvement over the state of the art in the automatic identification of commits that fix vulnerabilities.
arXiv Detail & Related papers (2021-05-07T15:57:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.