Diffploit: Facilitating Cross-Version Exploit Migration for Open Source Library Vulnerabilities
- URL: http://arxiv.org/abs/2511.12950v1
- Date: Mon, 17 Nov 2025 04:06:01 GMT
- Title: Diffploit: Facilitating Cross-Version Exploit Migration for Open Source Library Vulnerabilities
- Authors: Zirui Chen, Zhipeng Xue, Jiayuan Zhou, Xing Hu, Xin Xia, Xiaohu Yang,
- Abstract summary: We propose Diffploit, an iterative, diff-driven exploit migration method structured around two key modules.<n>We evaluate Diffploit on a large-scale dataset containing 102 Java CVEs and 689 version-migration tasks across 79 libraries.<n>It successfully migrates 84.2% exploits, outperforming the change-aware test repair tool TARGET by 52.0% and the rule-based tool in IDEA by 61.6%.
- Score: 13.559398564795048
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Exploits are commonly used to demonstrate the presence of library vulnerabilities and validate their impact across different versions. However, their direct application to alternative versions often fails due to breaking changes introduced during evolution. These failures stem from both changes in triggering conditions (e.g., API refactorings) and broken dynamic environments (e.g., build or runtime errors), which are challenging to interpret and adapt manually. Existing techniques primarily focus on code-level trace alignment through fuzzing, which is both time-consuming and insufficient for handling environment-level failures. Moreover, they often fall short when dealing with complicated triggering condition changes across versions. To overcome this, we propose Diffploit, an iterative, diff-driven exploit migration method structured around two key modules: the Context Module and the Migration Module. The Context Module dynamically constructs contexts derived from analyzing behavioral discrepancies between the target and reference versions, which capture the failure symptom and its related diff hunks. Leveraging these contexts, the Migration Module guides an LLM-based adaptation through an iterative feedback loop, balancing exploration of diff candidates and gradual refinement to resolve reproduction failures effectively. We evaluate Diffploit on a large-scale dataset containing 102 Java CVEs and 689 version-migration tasks across 79 libraries. Diffploit successfully migrates 84.2% exploits, outperforming the change-aware test repair tool TARGET by 52.0% and the rule-based tool in IDEA by 61.6%. Beyond technical effectiveness, Diffploit identifies 5 CVE reports with incorrect affected version ranges, three of which have been confirmed. It also discovers 111 unreported vulnerable versions in the GitHub Advisory Database.
Related papers
- Why Does the LLM Stop Computing: An Empirical Study of User-Reported Failures in Open-Source LLMs [50.075587392477935]
We conduct the first large-scale empirical study of 705 real-world failures from the open-source DeepSeek, Llama, and Qwen ecosystems.<n>Our analysis reveals a paradigm shift: white-box orchestration relocates the reliability bottleneck from model algorithmic defects to the systemic fragility of the deployment stack.
arXiv Detail & Related papers (2026-01-20T06:42:56Z) - DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems [48.971606069204825]
DoVer is an intervention-driven debug framework for large language model (LLM)-based multi-agent systems.<n>It augments hypothesis generation with active verification through targeted interventions.<n>DoVer flips 18-28% of failed trials into successes, achieves up to 16% milestone progress, and validates or refutes 30-60% of failure hypotheses.
arXiv Detail & Related papers (2025-12-07T09:23:48Z) - What a diff makes: automating code migration with large language models [0.15293427903448018]
We show that contexts containing diffs can significantly improve performance against out of the box LLMs.<n>We provide a dataset to assist in further development of this problem area, as well as an open-source Python package, AIMigrate, that can be used to assist with migrating code bases.<n>In a real-world migration of TYPHOIDSIM between STARSIM versions, AIMigrate correctly identified 65% of required changes in a single run, increasing to 80% with multiple runs, with 47% of changes generated perfectly.
arXiv Detail & Related papers (2025-10-31T18:08:52Z) - InspectCoder: Dynamic Analysis-Enabled Self Repair through interactive LLM-Debugger Collaboration [71.18377595277018]
Large Language Models (LLMs) frequently generate buggy code with complex logic errors that are challenging to diagnose.<n>We present InspectCoder, the first agentic program repair system that empowers LLMs to actively conduct dynamic analysis via interactive debugger control.
arXiv Detail & Related papers (2025-10-21T06:26:29Z) - Improving IR-based Bug Localization with Semantics-Driven Query Reduction [0.9298382208776371]
We propose IQLoc, a novel approach to localize software bugs against bug reports.<n>We leverage the program semantics understanding of transformer-based models to reason about the suspiciousness of code.<n> IQLoc improves MAP by 91.67% for bug reports with stack traces, 72.73% for those that include code elements, and 65.38% for those containing only descriptions in natural language.
arXiv Detail & Related papers (2025-10-06T03:43:38Z) - Where LLM Agents Fail and How They can Learn From Failures [62.196870049524364]
Large Language Model (LLM) agents have shown promise in solving complex, multi-step tasks.<n>They amplify vulnerability to cascading failures, where a single root-cause error propagates through subsequent decisions.<n>Current systems lack a framework that can comprehensively understand agent error in a modular and systemic way.<n>We introduce the AgentErrorTaxonomy, a modular classification of failure modes spanning memory, reflection, planning, action, and system-level operations.
arXiv Detail & Related papers (2025-09-29T18:20:27Z) - Bridging Solidity Evolution Gaps: An LLM-Enhanced Approach for Smart Contract Compilation Error Resolution [2.967464333639626]
Solidity, the dominant smart contract language, has rapidly evolved with frequent version updates to enhance security, functionality, and developer experience.<n>We conduct an empirical study to investigate the challenges in the Solidity version evolution and reveal that 81.68% of examined contracts encounter errors when compiled across different versions, with 86.92% of compilation errors.<n>We introduce SMCFIXER, a novel framework that integrates expert knowledge retrieval with LLM-based repair mechanisms for Solidity compilation error resolution.
arXiv Detail & Related papers (2025-08-14T10:42:26Z) - CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward [50.97588334916863]
We develop CompassVerifier, an accurate and robust lightweight verifier model for evaluation and outcome reward.<n>It demonstrates multi-domain competency spanning math, knowledge, and diverse reasoning tasks, with the capability to process various answer types.<n>We introduce VerifierBench benchmark comprising model outputs collected from multiple data sources, augmented through manual analysis of metaerror patterns to enhance CompassVerifier.
arXiv Detail & Related papers (2025-08-05T17:55:24Z) - CODEMENV: Benchmarking Large Language Models on Code Migration [11.735053997817765]
CODEMENV consists of 922 examples spanning 19 Python and Java packages.<n>It covers three core tasks: identifying functions incompatible with specific versions, detecting changes in function definitions, and adapting code to target environments.<n> Experimental evaluation with seven LLMs on CODEMENV yields an average pass@1 rate of 26.50%, with GPT-4O achieving the highest score at 43.84%.
arXiv Detail & Related papers (2025-06-01T08:29:59Z) - ReF Decompile: Relabeling and Function Call Enhanced Decompile [50.86228893636785]
The goal of decompilation is to convert compiled low-level code (e.g., assembly code) back into high-level programming languages.<n>This task supports various reverse engineering applications, such as vulnerability identification, malware analysis, and legacy software migration.
arXiv Detail & Related papers (2025-02-17T12:38:57Z) - Enhancing IR-based Fault Localization using Large Language Models [5.032687557488094]
This paper enhances Fault Localization (IRFL) by categorizing bug reports based on programming entities, stack traces, and natural language text.<n>To address inaccuracies in queries, we introduce a user and conversational-based query reformulation approach, termed LLmiRQ+.<n> Evaluation on 46 projects with 6,340 bug reports yields an MRR of 0.6770 and MAP of 0.5118, surpassing seven state-of-the-art IRFL techniques.
arXiv Detail & Related papers (2024-12-04T22:47:51Z) - Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses.
Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives.
The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z) - Get my drift? Catching LLM Task Drift with Activation Deltas [55.75645403965326]
Task drift allows attackers to exfiltrate data or influence the LLM's output for other users.<n>We show that a simple linear classifier can detect drift with near-perfect ROC AUC on an out-of-distribution test set.<n>We observe that this approach generalizes surprisingly well to unseen task domains, such as prompt injections, jailbreaks, and malicious instructions.
arXiv Detail & Related papers (2024-06-02T16:53:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.