In the Magma chamber: Update and challenges in ground-truth vulnerabilities revival for automatic input generator comparison
- URL: http://arxiv.org/abs/2503.19909v1
- Date: Tue, 25 Mar 2025 17:59:27 GMT
- Title: In the Magma chamber: Update and challenges in ground-truth vulnerabilities revival for automatic input generator comparison
- Authors: Timothée Riom, Sabine Houy, Bruno Kreyssig, Alexandre Bartel,
- Abstract summary: Magma introduced the notion of forward-porting to reintroduce vulnerable code in current software releases.<n>While their results are promising, the state-of-the-art lacks an update on the maintainability of this approach over time.<n>We characterise the challenges with forward-porting by reassessing the portability of Magma's CVEs four years after its release.
- Score: 42.95491588006701
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fuzzing is a well-established technique for detecting bugs and vulnerabilities. With the surge of fuzzers and fuzzer platforms being developed such as AFL and OSSFuzz rises the necessity to benchmark these tools' performance. A common problem is that vulnerability benchmarks are based on bugs in old software releases. For this very reason, Magma introduced the notion of forward-porting to reintroduce vulnerable code in current software releases. While their results are promising, the state-of-the-art lacks an update on the maintainability of this approach over time. Indeed, adding the vulnerable code to a recent software version might either break its functionality or make the vulnerable code no longer reachable. We characterise the challenges with forward-porting by reassessing the portability of Magma's CVEs four years after its release and manually reintroducing the vulnerabilities in the current software versions. We find the straightforward process efficient for 17 of the 32 CVEs in our study. We further investigate why a trivial forward-porting process fails in the 15 other CVEs. This involves identifying the commits breaking the forward-porting process and reverting them in addition to the bug fix. While we manage to complete the process for nine of these CVEs, we provide an update on all 15 and explain the challenges we have been confronted with in this process. Thereby, we give the basis for future work towards a sustainable forward-ported fuzzing benchmark.
Related papers
- Do Large Language Model Benchmarks Test Reliability? [66.1783478365998]
We investigate how well current benchmarks quantify model reliability.
Motivated by this gap in the evaluation of reliability, we propose the concept of so-called platinum benchmarks.
We evaluate a wide range of models on these platinum benchmarks and find that, indeed, frontier LLMs still exhibit failures on simple tasks.
arXiv Detail & Related papers (2025-02-05T18:58:19Z) - The Seeds of the FUTURE Sprout from History: Fuzzing for Unveiling Vulnerabilities in Prospective Deep-Learning Libraries [14.260990784121423]
Future is the first universal fuzzing framework tailored for newly introduced and prospective DL libraries.
It uses historical bug information from existing libraries and fine-tunes LLMs for specialized code generation.
It significantly outperforms existing fuzzers in bug detection, success rate of bug reproduction, validity rate of code generation, and API coverage.
arXiv Detail & Related papers (2024-12-02T09:33:28Z) - TransferFuzz: Fuzzing with Historical Trace for Verifying Propagated Vulnerability Code [24.827298607328466]
We introduce TransferFuzz, a novel vulnerability verification framework.<n>It can verify whether vulnerabilities propagated through code reuse can be triggered in new software.<n>It has proven its effectiveness by expanding the impacted software scope for 15 vulnerabilities listed in CVE reports.
arXiv Detail & Related papers (2024-11-27T13:46:39Z) - Discovery of Timeline and Crowd Reaction of Software Vulnerability Disclosures [47.435076500269545]
Apache Log4J was found to be vulnerable to remote code execution attacks.
More than 35,000 packages were forced to update their Log4J libraries with the latest version.
It is practically reasonable for software developers to update their third-party libraries whenever the software vendors have released a vulnerable-free version.
arXiv Detail & Related papers (2024-11-12T01:55:51Z) - Fixing Security Vulnerabilities with AI in OSS-Fuzz [9.730566646484304]
OSS-Fuzz is the most significant and widely used infrastructure for continuous validation of open source systems.
We customise the well-known AutoCodeRover agent for fixing security vulnerabilities.
Our experience with OSS-Fuzz vulnerability data shows that LLM agent autonomy is useful for successful security patching.
arXiv Detail & Related papers (2024-11-03T16:20:32Z) - Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities [46.34031902647788]
We present EnIGMA, an LM agent for autonomously solving Capture The Flag (CTF) challenges.
We introduce new tools and interfaces to improve the agent's ability to find and exploit security vulnerabilities.
Empirical analysis on 390 CTF challenges demonstrate that these new tools and interfaces substantially improve our agent's performance.
arXiv Detail & Related papers (2024-09-24T15:06:01Z) - JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models [123.66104233291065]
Jailbreak attacks cause large language models (LLMs) to generate harmful, unethical, or otherwise objectionable content.
evaluating these attacks presents a number of challenges, which the current collection of benchmarks and evaluation techniques do not adequately address.
JailbreakBench is an open-sourced benchmark with the following components.
arXiv Detail & Related papers (2024-03-28T02:44:02Z) - On the Security Blind Spots of Software Composition Analysis [46.1389163921338]
We present a novel approach to detect vulnerable clones in the Maven repository.
We retrieve over 53k potential vulnerable clones from Maven Central.
We detect 727 confirmed vulnerable clones and synthesize a testable proof-of-vulnerability project for each of those.
arXiv Detail & Related papers (2023-06-08T20:14:46Z) - Multi-Granularity Detector for Vulnerability Fixes [13.653249890867222]
We propose MiDas (Multi-Granularity Detector for Vulnerability Fixes) to identify vulnerability-fixing commits.
MiDas constructs different neural networks for each level of code change granularity, corresponding to commit-level, file-level, hunk-level, and line-level.
MiDas outperforms the current state-of-the-art baseline in terms of AUC by 4.9% and 13.7% on Java and Python-based datasets.
arXiv Detail & Related papers (2023-05-23T10:06:28Z) - What Happens When We Fuzz? Investigating OSS-Fuzz Bug History [0.9772968596463595]
We analyzed 44,102 reported issues made public by OSS-Fuzz prior to March 12, 2022.
We identified the bug-contributing commits to estimate when the bug containing code was introduced, and measure the timeline from introduction to detection to fix.
arXiv Detail & Related papers (2023-05-19T05:15:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.