Related papers: Repeton: Structured Bug Repair with ReAct-Guided Patch-and-Test Cycles

Repeton: Structured Bug Repair with ReAct-Guided Patch-and-Test Cycles

URL: http://arxiv.org/abs/2506.08173v1
Date: Mon, 09 Jun 2025 19:36:40 GMT
Title: Repeton: Structured Bug Repair with ReAct-Guided Patch-and-Test Cycles
Authors: Nguyen Phu Vinh, Anh Chung Hoang, Chris Ngo, Truong-Son Hy,
Abstract summary: Large Language Models (LLMs) have shown strong capabilities in code generation and comprehension, yet their application to complex software engineering tasks often suffers from low precision and limited interpretability.<n>We present Repeton, a fully open-source framework that leverages LLMs for precise and automated code manipulation in real-world Git.
Score: 1.387448620257867
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have shown strong capabilities in code generation and comprehension, yet their application to complex software engineering tasks often suffers from low precision and limited interpretability. We present Repeton, a fully open-source framework that leverages LLMs for precise and automated code manipulation in real-world Git repositories. Rather than generating holistic fixes, Repeton operates through a structured patch-and-test pipeline: it iteratively diagnoses issues, proposes code changes, and validates each patch through automated testing. This stepwise process is guided by lightweight heuristics and development tools, avoiding reliance on embedding-based retrieval systems. Evaluated on the SWE-bench Lite benchmark, our method shows good performance compared to RAG-based methods in both patch validity and interpretability. By decomposing software engineering tasks into modular, verifiable stages, Repeton provides a practical path toward scalable and transparent autonomous debugging.

Related papers

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward [50.97588334916863]
We develop CompassVerifier, an accurate and robust lightweight verifier model for evaluation and outcome reward.<n>It demonstrates multi-domain competency spanning math, knowledge, and diverse reasoning tasks, with the capability to process various answer types.<n>We introduce VerifierBench benchmark comprising model outputs collected from multiple data sources, augmented through manual analysis of metaerror patterns to enhance CompassVerifier.
arXiv Detail & Related papers (2025-08-05T17:55:24Z)
RePaCA: Leveraging Reasoning Large Language Models for Static Automated Patch Correctness Assessment [0.0]
We introduce RePaCA, a novel static APCA technique that leverages Large Language Models (LLMs) specialized in thinking tasks.<n>Our approach achieves state-of-the-art performance, with 83.1% accuracy and an 84.8% F1-score.
arXiv Detail & Related papers (2025-07-30T11:21:09Z)
Augmenting Large Language Models with Static Code Analysis for Automated Code Quality Improvements [0.36832029288386137]
This study examined code issue detection and revision automation by integrating Large Language Models (LLMs) into software development.<n>A static code analysis framework detects issues such as bugs, vulnerabilities, and code smells within a large-scale software project.<n>Retrieval-augmented generation (RAG) is implemented to enhance the relevance and precision of the revisions.
arXiv Detail & Related papers (2025-06-12T03:39:25Z)
SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving [90.32201622392137]
We present SwingArena, a competitive evaluation framework for Large Language Models (LLMs)<n>Unlike traditional static benchmarks, SwingArena models the collaborative process of software by pairing LLMs as iterations, who generate patches, and reviewers, who create test cases and verify the patches through continuous integration (CI) pipelines.
arXiv Detail & Related papers (2025-05-29T18:28:02Z)
Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z)
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)
ToolCoder: A Systematic Code-Empowered Tool Learning Framework for Large Language Models [81.12673534903979]
Tool learning has emerged as a crucial capability for large language models (LLMs) to solve complex real-world tasks through interaction with external tools.<n>We propose ToolCoder, a novel framework that reformulates tool learning as a code generation task.
arXiv Detail & Related papers (2025-02-17T03:42:28Z)
SuperCoder2.0: Technical Report on Exploring the feasibility of LLMs as Autonomous Programmer [0.0]
SuperCoder2.0 is an advanced autonomous system designed to enhance software development through artificial intelligence. System combines an AI-native development approach with intelligent agents to enable fully autonomous coding.
arXiv Detail & Related papers (2024-09-17T13:44:42Z)
Fix the Tests: Augmenting LLMs to Repair Test Cases with Static Collector and Neural Reranker [9.428021853841296]
We propose SYNTER, a novel approach to automatically repair obsolete test cases via precise and concise TROCtxs construction. With the augmentation of constructed TROCtxs, hallucinations are reduced by 57.1%.
arXiv Detail & Related papers (2024-07-04T04:24:43Z)
A Unified Debugging Approach via LLM-Based Multi-Agent Synergy [39.11825182386288]
FixAgent is an end-to-end framework for unified debug through multi-agent synergy. It significantly outperforms state-of-the-art repair methods, fixing 1.25$times$ to 2.56$times$ bugs on the repo-level benchmark, Defects4J.
arXiv Detail & Related papers (2024-04-26T04:55:35Z)
RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen) RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs. We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.