Related papers: Fixing 7,400 Bugs for 1$: Cheap Crash-Site Program Repair

Fixing 7,400 Bugs for 1$: Cheap Crash-Site Program Repair

URL: http://arxiv.org/abs/2505.13103v2
Date: Sat, 24 May 2025 11:48:30 GMT
Title: Fixing 7,400 Bugs for 1$: Cheap Crash-Site Program Repair
Authors: Han Zheng, Ilia Shumailov, Tianqi Fan, Aiden Hall, Mathias Payer,
Abstract summary: We propose crash-site repair to simplify the repair task while still mitigating the risk of exploitation.<n>We introduce a template-guided patch generation approach that significantly reduces the token cost of Large Language Models (LLMs)<n>Our results show that, when combined with the top-performing agent CodeRover-S, WILLIAMT reduces token cost by 45.9% and increases the bug-fixing rate to 73.5% (+29.6%) on ARVO.
Score: 26.981770213053004
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid advancement of bug-finding techniques has led to the discovery of more vulnerabilities than developers can reasonably fix, creating an urgent need for effective Automated Program Repair (APR) methods. However, the complexity of modern bugs often makes precise root cause analysis difficult and unreliable. To address this challenge, we propose crash-site repair to simplify the repair task while still mitigating the risk of exploitation. In addition, we introduce a template-guided patch generation approach that significantly reduces the token cost of Large Language Models (LLMs) while maintaining both efficiency and effectiveness. We implement our prototype system, WILLIAMT, and evaluate it against state-of-the-art APR tools. Our results show that, when combined with the top-performing agent CodeRover-S, WILLIAMT reduces token cost by 45.9% and increases the bug-fixing rate to 73.5% (+29.6%) on ARVO, a ground-truth open source software vulnerabilities benchmark. Furthermore, we demonstrate that WILLIAMT can function effectively even without access to frontier LLMs: even a local model running on a Mac M4 Mini achieves a reasonable repair rate. These findings highlight the broad applicability and scalability of WILLIAMT.

Related papers

Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs [84.30534714651093]
We present an innovative APR tool for Dafny, a verification-aware programming language.<n>We localize faults through a series of steps, which include using Hoare Logic to determine the state of each statement within the program.<n>We evaluate our approach using DafnyBench, a benchmark of real-world Dafny programs.
arXiv Detail & Related papers (2025-07-04T15:36:12Z)
APRMCTS: Improving LLM-based Automated Program Repair with Iterative Tree Search [6.314858275160081]
APRMCTS incorporates Monte Carlo Tree Search (MCTS) into patch searching by performing a global evaluation of the explored patches and selecting the most promising one for subsequent refinement and generation.<n>Our experiments on 835 bugs from Defects4J demonstrate that, when integrated with GPT-3.5, APRMCTS can fix a total of 201 bugs, which outperforms all state-of-the-art baselines.
arXiv Detail & Related papers (2025-07-02T15:44:12Z)
The Art of Repair: Optimizing Iterative Program Repair with Instruction-Tuned Models [48.073219761367184]
We investigate an APR pipeline that balances the generation of multiple outputs and multiple rounds of iteration.<n>We fine-tune each model on an APR dataset with three sizes (1K, 30K, 65K) and two techniques (Full Fine-Tuning and LoRA)<n>Our results show that by using only a fraction (1%) of the fine-tuning dataset, we can achieve improvements of up to 78% in the number of plausible patches generated.
arXiv Detail & Related papers (2025-05-05T18:06:51Z)
AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security [74.22452069013289]
AegisLLM is a cooperative multi-agent defense against adversarial attacks and information leakage.<n>We show that scaling agentic reasoning system at test-time substantially enhances robustness without compromising model utility.<n> Comprehensive evaluations across key threat scenarios, including unlearning and jailbreaking, demonstrate the effectiveness of AegisLLM.
arXiv Detail & Related papers (2025-04-29T17:36:05Z)
Self-Regulation and Requesting Interventions [63.5863047447313]
We propose an offline framework that trains a "helper" policy to request interventions.<n>We score optimal intervention timing with PRMs and train the helper model on these labeled trajectories.<n>This offline approach significantly reduces costly intervention calls during training.
arXiv Detail & Related papers (2025-02-07T00:06:17Z)
LLM4CVE: Enabling Iterative Automated Vulnerability Repair with Large Language Models [9.946058168276744]
Large Language Models (LLM) have opened up the possibility for many software defects to be patched automatically.<n>We propose an iterative pipeline that robustly fixes vulnerable functions in real-world code with high accuracy.<n>We achieve a human-verified quality score of 8.51/10 and an increase in groundtruth code similarity of 20% with Llama 3 70B.
arXiv Detail & Related papers (2025-01-07T00:21:42Z)
There are More Fish in the Sea: Automated Vulnerability Repair via Binary Templates [4.907610470063863]
We propose a template-based automated vulnerability repair approach for Java binaries.<n>Experiments on the Vul4J dataset demonstrate that TemVUR successfully repairs 11 vulnerabilities.<n>To assess the generalizability of TemVUR, we curate the ManyVuls4J dataset.
arXiv Detail & Related papers (2024-11-27T06:59:45Z)
Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities [63.603861880022954]
We introduce ADV-LLM, an iterative self-tuning process that crafts adversarial LLMs with enhanced jailbreak ability.<n>Our framework significantly reduces the computational cost of generating adversarial suffixes while achieving nearly 100% ASR on various open-source LLMs.<n>It exhibits strong attack transferability to closed-source models, achieving 99% ASR on GPT-3.5 and 49% ASR on GPT-4, despite being optimized solely on Llama3.
arXiv Detail & Related papers (2024-10-24T06:36:12Z)
APPATCH: Automated Adaptive Prompting Large Language Models for Real-World Software Vulnerability Patching [24.958856670970366]
In this paper, we leverage the power and merits of pre-trained language language models (LLMs) to enable automated vulnerability patching.<n>To elicit LLMs to effectively reason about vulnerable code behaviors, we introduce vulnerability semantics reasoning and adaptive prompting.<n>Our evaluation of AP on 97 zero-day vulnerabilities and 20 existing vulnerabilities demonstrates its superior performance to both existing methods and state-of-theart non-LLM-based techniques.
arXiv Detail & Related papers (2024-08-24T14:51:50Z)
On The Effectiveness of Dynamic Reduction Techniques in Automated Program Repair [1.7767466724342067]
We describe a program repair framework that effectively handles large-scale buggy programs of industrial complexity. The framework exploits program reduction in the form of program slicing to eliminate parts of the code irrelevant to the bug being repaired. Our empirical results on the widely used Defects4J dataset reveal that a substantial improvement in performance can be obtained without any degradation in repair quality.
arXiv Detail & Related papers (2024-06-23T21:35:07Z)
Hybrid Automated Program Repair by Combining Large Language Models and Program Analysis [12.7034916462208]
Automated Program Repair (APR) has garnered significant attention due to its potential to streamline the bug repair process for human developers. This paper introduces an innovative APR approach called GIANTREPAIR. Based on this insight, GIANTREPAIR first constructs patch skeletons from LLM-generated patches to confine the patch space, and then generates high-quality patches tailored to specific programs.
arXiv Detail & Related papers (2024-06-03T05:05:12Z)
SUPERNOVA: Automating Test Selection and Defect Prevention in AAA Video Games Using Risk Based Testing and Machine Learning [62.997667081978825]
Testing video games is an increasingly difficult task as traditional methods fail to scale with growing software systems. We present SUPERNOVA, a system responsible for test selection and defect prevention while also functioning as an automation hub. The direct impact of this has been observed to be a reduction in 55% or more testing hours for an undisclosed sports game title.
arXiv Detail & Related papers (2022-03-10T00:47:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.