Related papers: Execution-free Program Repair

Related papers

Combining Tests and Proofs for Better Software Verification [30.905701881162685]
A different perspective is emerging, in which testing and proving are complementary rather than competing techniques for producing software of verified quality.<n>A counterexample is an input combination that makes the program fail.<n>One can, however, apply counterexample generation to incorrect programs, as a tool for automatic test generation.<n>We can also use these mechanisms to help produce program fixes for incorrect programs, with a guarantee that the fixes are correct.
arXiv Detail & Related papers (2026-01-21T22:39:17Z)
Do AI models help produce verified bug fixes? [62.985237003585674]
Large Language Models are used to produce corrections to software bugs.<n>This paper investigates how programmers use Large Language Models to complement their own skills.<n>The results are a first step towards a proper role for AI and LLMs in providing guaranteed-correct fixes to program bugs.
arXiv Detail & Related papers (2025-07-21T17:30:16Z)
Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs [84.30534714651093]
We present an innovative APR tool for Dafny, a verification-aware programming language.<n>We localize faults through a series of steps, which include using Hoare Logic to determine the state of each statement within the program.<n>We evaluate our approach using DafnyBench, a benchmark of real-world Dafny programs.
arXiv Detail & Related papers (2025-07-04T15:36:12Z)
UTFix: Change Aware Unit Test Repairing using LLM [24.12850207529614]
We present UTFix, a novel approach for repairing unit tests when their corresponding focal methods undergo changes. Our approach leverages language models to repair unit tests by providing contextual information such as static code slices, dynamic code slices, and failure messages. To the best of our knowledge, this is the first comprehensive study focused on unit test in evolving Python projects.
arXiv Detail & Related papers (2025-03-19T06:10:03Z)
Show Me Why It's Correct: Saving 1/3 of Debugging Time in Program Repair with Interactive Runtime Comparison [18.933377426587015]
We propose an interactive approach called iFix to facilitate patch understanding and comparison. iFix performs static analysis to identify runtime variables related to the buggy statement. It captures runtime values during execution for each patch, allowing users to compare and contrast their runtime behavior.
arXiv Detail & Related papers (2025-03-01T20:52:49Z)
Towards Practical and Useful Automated Program Repair for Debugging [4.216808129651161]
PracAPR is an interactive repair system that works in an Integrated Development Environment (IDE) PracAPR does not require a test suite or program re-execution.
arXiv Detail & Related papers (2024-07-12T03:19:54Z)
VDebugger: Harnessing Execution Feedback for Debugging Visual Programs [103.61860743476933]
We introduce V Debugger, a critic-refiner framework trained to localize and debug visual programs by tracking execution step by step. V Debugger identifies and corrects program errors leveraging detailed execution feedback, improving interpretability and accuracy. Evaluations on six datasets demonstrate V Debugger's effectiveness, showing performance improvements of up to 3.2% in downstream task accuracy.
arXiv Detail & Related papers (2024-06-19T11:09:16Z)
NExT: Teaching Large Language Models to Reason about Code Execution [50.93581376646064]
Large language models (LLMs) of code are typically trained on the surface textual form of programs. We propose NExT, a method to teach LLMs to inspect the execution traces of programs and reason about their run-time behavior.
arXiv Detail & Related papers (2024-04-23T01:46:32Z)
Automated Test Case Repair Using Language Models [0.5708902722746041]
Unrepaired broken test cases can degrade test suite quality and disrupt the software development process. We present TaRGet, a novel approach leveraging pre-trained code language models for automated test case repair. TaRGet treats test repair as a language translation task, employing a two-step process to fine-tune a language model.
arXiv Detail & Related papers (2024-01-12T18:56:57Z)
Automatic Generation of Test Cases based on Bug Reports: a Feasibility Study with Large Language Models [4.318319522015101]
Existing approaches produce test cases that either can be qualified as simple (e.g. unit tests) or that require precise specifications. Most testing procedures still rely on test cases written by humans to form test suites. We investigate the feasibility of performing this generation by leveraging large language models (LLMs) and using bug reports as inputs.
arXiv Detail & Related papers (2023-10-10T05:30:12Z)
Lyra: Orchestrating Dual Correction in Automated Theorem Proving [63.115422781158934]
Lyra is a new framework that employs two distinct correction mechanisms: Tool Correction and Conjecture Correction. Tool Correction contributes to mitigating hallucinations, thereby improving the overall accuracy of the proof. Conjecture Correction refines generation with instruction but does not collect paired (generation, error & refinement) prompts.
arXiv Detail & Related papers (2023-09-27T17:29:41Z)
FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categories and Test Code Repair [0.5749787074942512]
Flaky tests are problematic because they non-deterministically pass or fail for the same software version under test. In this paper, we focus on predicting the type of fix that is required to remove flakiness and then repair the test code on that basis. One key idea is to guide the repair process with additional knowledge about the test's flakiness in the form of its predicted fix category.
arXiv Detail & Related papers (2023-06-21T19:34:16Z)
FixEval: Execution-based Evaluation of Program Fixes for Programming Problems [23.987104440395576]
We introduce FixEval, a benchmark comprising of buggy code submissions to competitive programming problems and their corresponding fixes. FixEval offers an extensive collection of unit tests to evaluate the correctness of model-generated program fixes. Our experiments show that match-based metrics do not reflect model-generated program fixes accurately.
arXiv Detail & Related papers (2022-06-15T20:18:43Z)
Break-It-Fix-It: Unsupervised Learning for Program Repair [90.55497679266442]
We propose a new training approach, Break-It-Fix-It (BIFI), which has two key ideas. We use the critic to check a fixer's output on real bad inputs and add good (fixed) outputs to the training data. Based on these ideas, we iteratively update the breaker and the fixer while using them in conjunction to generate more paired data. BIFI outperforms existing methods, obtaining 90.5% repair accuracy on GitHub-Python and 71.7% on DeepFix.
arXiv Detail & Related papers (2021-06-11T20:31:04Z)
Graph-based, Self-Supervised Program Repair from Diagnostic Feedback [108.48853808418725]
We introduce a program-feedback graph, which connects symbols relevant to program repair in source code and diagnostic feedback. We then apply a graph neural network on top to model the reasoning process. We present a self-supervised learning paradigm for program repair that leverages unlabeled programs available online.
arXiv Detail & Related papers (2020-05-20T07:24:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.