Related papers: Show Me Why It's Correct: Saving 1/3 of Debugging Time in Program Repair with Interactive Runtime Comparison

Show Me Why It's Correct: Saving 1/3 of Debugging Time in Program Repair with Interactive Runtime Comparison

URL: http://arxiv.org/abs/2503.00618v1
Date: Sat, 01 Mar 2025 20:52:49 GMT
Title: Show Me Why It's Correct: Saving 1/3 of Debugging Time in Program Repair with Interactive Runtime Comparison
Authors: Ruixin Wang, Zhongkai Zhao, Le Fang, Nan Jiang, Yiling Lou, Lin Tan, Tianyi Zhang,
Abstract summary: We propose an interactive approach called iFix to facilitate patch understanding and comparison.<n>iFix performs static analysis to identify runtime variables related to the buggy statement.<n>It captures runtime values during execution for each patch, allowing users to compare and contrast their runtime behavior.
Score: 18.933377426587015
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automated Program Repair (APR) holds the promise of alleviating the burden of debugging and fixing software bugs. Despite this, developers still need to manually inspect each patch to confirm its correctness, which is tedious and time-consuming. This challenge is exacerbated in the presence of plausible patches, which accidentally pass test cases but may not correctly fix the bug. To address this challenge, we propose an interactive approach called iFix to facilitate patch understanding and comparison based on their runtime difference. iFix performs static analysis to identify runtime variables related to the buggy statement and captures their runtime values during execution for each patch. These values are then aligned across different patch candidates, allowing users to compare and contrast their runtime behavior. To evaluate iFix, we conducted a within-subjects user study with 28 participants. Compared with manual inspection and a state-of-the-art interactive patch filtering technique, iFix reduced participants' task completion time by 36% and 33% while also improving their confidence by 50% and 20%, respectively. Besides, quantitative experiments demonstrate that iFix improves the ranking of correct patches by at least 39% compared with other patch ranking methods and is generalizable to different APR tools.

Related papers

Are "Solved Issues" in SWE-bench Really Solved Correctly? An Empirical Study [20.46588369793562]
Most popular benchmarks for automated issue solving are SWE-bench and its human-filtered subset SWE-bench Verified. This paper presents an in-depth empirical study of the correctness of plausible patches generated by three state-of-the-art issue-solving tools evaluated on SWE-bench Verified.
arXiv Detail & Related papers (2025-03-19T14:02:21Z)
Ranking Plausible Patches by Historic Feature Frequencies [4.129445293427074]
This paper presents PrevaRank, a technique that ranks plausible patches according to their feature similarity with historic programmer-written fixes for similar bugs. PrevaRank consistently improved the ranking of correct fixes. It works robustly with a variety of APR tools and bugs, with negligible overhead.
arXiv Detail & Related papers (2024-07-24T12:58:14Z)
ContrastRepair: Enhancing Conversation-Based Automated Program Repair via Contrastive Test Case Pairs [23.419180504723546]
ContrastRepair is a novel APR approach that augments conversation-driven APR by providing contrastive test pairs. We evaluate ContrastRepair on multiple benchmark datasets, including Defects4j, QuixBugs, and HumanEval-Java.
arXiv Detail & Related papers (2024-03-04T12:15:28Z)
RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen) RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs. We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z)
Target before Shooting: Accurate Anomaly Detection and Localization under One Millisecond via Cascade Patch Retrieval [49.45246833329707]
We re-examine the "matching" nature of Anomaly Detection (AD) We propose a new AD framework that simultaneously enjoys new records of AD accuracy and dramatically high running speed.
arXiv Detail & Related papers (2023-08-13T11:49:05Z)
Patch Space Exploration using Static Analysis Feedback [8.13782364161157]
We show how to automatically repair memory safety issues, by leveraging static analysis to guide repair. Our proposed approach learns what a desirable patch is by inspecting how close a patch is to fixing the bug. We make repair scalable by creating classes of equivalent patches according to the effect they have on the symbolic heap, and then invoking the validation oracle only once per class of patch equivalence.
arXiv Detail & Related papers (2023-08-01T05:22:10Z)
Patch-aware Batch Normalization for Improving Cross-domain Robustness [55.06956781674986]
Cross-domain tasks present a challenge in which the model's performance will degrade when the training set and the test set follow different distributions. We propose a novel method called patch-aware batch normalization (PBN) By exploiting the differences between local patches of an image, our proposed PBN can effectively enhance the robustness of the model's parameters.
arXiv Detail & Related papers (2023-04-06T03:25:42Z)
Test-based Patch Clustering for Automatically-Generated Patches Assessment [21.051652050359852]
Overfitting happens when a patch is run and the test suite does not reveal any error, but the patch actually does not fix the underlying bug or it introduces a new defect that is not covered by the test suite. Our work aims to minimize the number of plausible patches that programmers have to review, thereby reducing the time required to find a correct patch. We introduce a novel light-weight test-based patch clustering approach called xTestCluster, which clusters patches based on their dynamic behavior.
arXiv Detail & Related papers (2022-07-22T13:39:27Z)
Certified Error Control of Candidate Set Pruning for Two-Stage Relevance Ranking [57.42241521034744]
We propose the concept of certified error control of candidate set pruning for relevance ranking. Our method successfully prunes the first-stage retrieved candidate sets to improve the second-stage reranking speed.
arXiv Detail & Related papers (2022-05-19T16:00:13Z)
Segment and Complete: Defending Object Detectors against Adversarial Patch Attacks with Robust Patch Detection [142.24869736769432]
Adversarial patch attacks pose a serious threat to state-of-the-art object detectors. We propose Segment and Complete defense (SAC), a framework for defending object detectors against patch attacks. We show SAC can significantly reduce the targeted attack success rate of physical patch attacks.
arXiv Detail & Related papers (2021-12-08T19:18:48Z)
Break-It-Fix-It: Unsupervised Learning for Program Repair [90.55497679266442]
We propose a new training approach, Break-It-Fix-It (BIFI), which has two key ideas. We use the critic to check a fixer's output on real bad inputs and add good (fixed) outputs to the training data. Based on these ideas, we iteratively update the breaker and the fixer while using them in conjunction to generate more paired data. BIFI outperforms existing methods, obtaining 90.5% repair accuracy on GitHub-Python and 71.7% on DeepFix.
arXiv Detail & Related papers (2021-06-11T20:31:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.