Related papers: Detecting Security Fixes in Open-Source Repositories using Static Code Analyzers

Detecting Security Fixes in Open-Source Repositories using Static Code Analyzers

URL: http://arxiv.org/abs/2105.03346v1
Date: Fri, 7 May 2021 15:57:17 GMT
Title: Detecting Security Fixes in Open-Source Repositories using Static Code Analyzers
Authors: Therese Fehrer, Roc\'io Cabrera Lozoya, Antonino Sabetta, Dario Di Nucci, Damian A. Tamburri
Abstract summary: We study the extent to which the output of off-the-shelf static code analyzers can be used as a source of features to represent commits in Machine Learning (ML) applications. We investigate how such features can be used to construct embeddings and train ML models to automatically identify source code commits that contain vulnerability fixes. We find that the combination of our method with commit2vec represents a tangible improvement over the state of the art in the automatic identification of commits that fix vulnerabilities.
Score: 8.716427214870459
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The sources of reliable, code-level information about vulnerabilities that affect open-source software (OSS) are scarce, which hinders a broad adoption of advanced tools that provide code-level detection and assessment of vulnerable OSS dependencies. In this paper, we study the extent to which the output of off-the-shelf static code analyzers can be used as a source of features to represent commits in Machine Learning (ML) applications. In particular, we investigate how such features can be used to construct embeddings and train ML models to automatically identify source code commits that contain vulnerability fixes. We analyze such embeddings for security-relevant and non-security-relevant commits, and we show that, although in isolation they are not different in a statistically significant manner, it is possible to use them to construct a ML pipeline that achieves results comparable with the state of the art. We also found that the combination of our method with commit2vec represents a tangible improvement over the state of the art in the automatic identification of commits that fix vulnerabilities: the ML models we construct and commit2vec are complementary, the former being more generally applicable, albeit not as accurate.

Related papers

Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs [60.881609323604685]
Large Language Models (LLMs) accessed via black-box APIs introduce a trust challenge. Users pay for services based on advertised model capabilities. providers may covertly substitute the specified model with a cheaper, lower-quality alternative to reduce operational costs. This lack of transparency undermines fairness, erodes trust, and complicates reliable benchmarking.
arXiv Detail & Related papers (2025-04-07T03:57:41Z)
Bomfather: An eBPF-based Kernel-level Monitoring Framework for Accurate Identification of Unknown, Unused, and Dynamically Loaded Dependencies in Modern Software Supply Chains [0.0]
Inaccuracies in dependency-tracking methods undermine the security and integrity of modern software supply chains. This paper introduces a kernel-level framework leveraging extended Berkeley Packet Filter (eBPF) to capture software build dependencies transparently in real time.
arXiv Detail & Related papers (2025-03-03T22:32:59Z)
Code Change Intention, Development Artifact and History Vulnerability: Putting Them Together for Vulnerability Fix Detection by LLM [13.278153690972243]
VulFixMiner and CoLeFunDa focus solely on code changes, neglecting essential context from development artifacts. We propose LLM4VFD, a novel framework that leverages Large Language Models (LLMs) enhanced with Chain-of-Thought reasoning and In-Context Learning.
arXiv Detail & Related papers (2025-01-24T23:40:03Z)
TrustRAG: Enhancing Robustness and Trustworthiness in RAG [31.231916859341865]
TrustRAG is a framework that systematically filters compromised and irrelevant contents before they are retrieved for generation. TrustRAG delivers substantial improvements in retrieval accuracy, efficiency, and attack resistance compared to existing approaches.
arXiv Detail & Related papers (2025-01-01T15:57:34Z)
In-Context Experience Replay Facilitates Safety Red-Teaming of Text-to-Image Diffusion Models [97.82118821263825]
Text-to-image (T2I) models have shown remarkable progress, but their potential to generate harmful content remains a critical concern in the ML community. We propose ICER, a novel red-teaming framework that generates interpretable and semantic meaningful problematic prompts. Our work provides crucial insights for developing more robust safety mechanisms in T2I systems.
arXiv Detail & Related papers (2024-11-25T04:17:24Z)
Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries [2.696054049278301]
We introduce DeBinVul, a novel decompiled binary code vulnerability dataset. We fine-tune state-of-the-art LLMs using DeBinVul and report on a performance increase of 19%, 24%, and 21% in detecting binary code vulnerabilities.
arXiv Detail & Related papers (2024-11-07T18:54:31Z)
The Impact of SBOM Generators on Vulnerability Assessment in Python: A Comparison and a Novel Approach [56.4040698609393]
Software Bill of Materials (SBOM) has been promoted as a tool to increase transparency and verifiability in software composition. Current SBOM generation tools often suffer from inaccuracies in identifying components and dependencies. We propose PIP-sbom, a novel pip-inspired solution that addresses their shortcomings.
arXiv Detail & Related papers (2024-09-10T10:12:37Z)
LLM-Enhanced Static Analysis for Precise Identification of Vulnerable OSS Versions [12.706661324384319]
Open-source software (OSS) has experienced a surge in popularity, attributed to its collaborative development model and cost-effective nature. The adoption of specific software versions in development projects may introduce security risks when these versions bring along vulnerabilities. Current methods of identifying vulnerable versions typically analyze and trace the code involved in vulnerability patches using static analysis with pre-defined rules. This paper presents Vercation, an approach designed to identify vulnerable versions of OSS written in C/C++.
arXiv Detail & Related papers (2024-08-14T06:43:06Z)
Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses. Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives. The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z)
Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study [1.03590082373586]
We propose using large language models (LLMs) to assist in finding vulnerabilities in source code. The aim is to test multiple state-of-the-art LLMs and identify the best prompting strategies. We find that LLMs can pinpoint many more issues than traditional static analysis tools, outperforming traditional tools in terms of recall and F1 scores.
arXiv Detail & Related papers (2024-05-24T14:59:19Z)
Software Vulnerability and Functionality Assessment using LLMs [0.8057006406834466]
We investigate whether Large Language Models (LLMs) can aid with code reviews. Our investigation focuses on two tasks that we argue are fundamental to good reviews.
arXiv Detail & Related papers (2024-03-13T11:29:13Z)
Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations [76.19419888353586]
Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations. We present our efforts to create and deploy a library of detectors: compact and easy-to-build classification models that provide labels for various harms.
arXiv Detail & Related papers (2024-03-09T21:07:16Z)
Creating Training Sets via Weak Indirect Supervision [66.77795318313372]
Weak Supervision (WS) frameworks synthesize training labels from multiple potentially noisy supervision sources. We formulate Weak Indirect Supervision (WIS), a new research problem for automatically synthesizing training labels. We develop a probabilistic modeling approach, PLRM, which uses user-provided label relations to model and leverage indirect supervision sources.
arXiv Detail & Related papers (2021-10-07T14:09:35Z)
Automated Mapping of Vulnerability Advisories onto their Fix Commits in Open Source Repositories [7.629717457706326]
We present an approach that combines practical experience and machine-learning (ML) An advisory record containing key information about a vulnerability is extracted from an advisory. A subset of candidate fix commits is obtained from the source code repository of the affected project.
arXiv Detail & Related papers (2021-03-24T17:50:35Z)
D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using Differential Analysis [55.15995704119158]
We propose D2A, a differential analysis based approach to label issues reported by static analysis tools. We use D2A to generate a large labeled dataset to train models for vulnerability identification.
arXiv Detail & Related papers (2021-02-16T07:46:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.