Related papers: CREF: An LLM-based Conversational Software Repair Framework for Programming Tutors

CREF: An LLM-based Conversational Software Repair Framework for Programming Tutors

URL: http://arxiv.org/abs/2406.13972v2
Date: Mon, 8 Jul 2024 10:28:20 GMT
Title: CREF: An LLM-based Conversational Software Repair Framework for Programming Tutors
Authors: Boyang Yang, Haoye Tian, Weiguo Pian, Haoran Yu, Haitao Wang, Jacques Klein, Tegawendé F. Bissyandé, Shunfu Jin,
Abstract summary: It is crucial to recognize that existing repair benchmarks may have influenced LLM training data, potentially causing data leakage. Our work assesses the repair performance of 12 LLMs on TutorCode, measuring repair correctness (TOP-5 and AVG-5) and patch precision (RPSR) To fully harness LLMs' conversational capabilities and the benefits of augmented information, we introduce a novel conversational semi-automatic repair framework CREF assisting human tutor.
Score: 8.415004837059863
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Program repair techniques offer cost-saving benefits for debugging within software development and programming education scenarios. With the proven effectiveness of Large Language Models (LLMs) in code-related tasks, researchers have explored their potential for program repair. However, it is crucial to recognize that existing repair benchmarks may have influenced LLM training data, potentially causing data leakage. To evaluate LLMs' realistic repair capabilities, (1) we introduce an extensive, non-crawled benchmark, referred to as TutorCode, comprising 1,239 C++ defect codes and associated information such as tutor guidance, solution description, failing test cases, and the corrected code. Our work assesses the repair performance of 12 LLMs on TutorCode, measuring repair correctness (TOP-5 and AVG-5) and patch precision (RPSR). (2) We then provide a comprehensive investigation into which types of extra information can help LLMs improve their performance in repairing defects. Among these types, tutor guidance was found to be the most effective information in enhancing LLM repair capabilities. To fully harness LLMs' conversational capabilities and the benefits of augmented information, (3) we introduce a novel conversational semi-automatic repair framework CREF assisting human tutor. It demonstrates a remarkable AVG-5 improvement of 17.2%-24.6% compared to the baseline, achieving an impressive AVG-5 of 76.6% when utilizing GPT-4. These results highlight the potential for enhancing LLMs' repair capabilities through interactions with tutors and historical conversations involving incorrect responses. The successful application of CREF in a real-world educational setting demonstrates its effectiveness in reducing tutors' workload and improving students' learning experience, while also showcasing its promise for facilitating other software engineering tasks, such as code review.

Related papers

Studying and Understanding the Effectiveness and Failures of Conversational LLM-Based Repair [3.93048798243871]
Automated program repair (APR) is designed to automate the process of bug-fixing. Advanced APR techniques powered by conversational language models (LLMs) have exhibited impressive repair abilities. Despite the superiority, conversational APR techniques still fail to repair a large number of bugs.
arXiv Detail & Related papers (2025-03-19T09:39:32Z)
FastFixer: An Efficient and Effective Approach for Repairing Programming Assignments [21.848112758958543]
We propose FastFixer, an efficient and effective approach for programming assignment repair. We first propose a novel repair-oriented fine-tuning strategy, aiming to enhance the LLM's attention towards learning how to generate the necessary patch and its associated context. Considering the repair efficiency, FastFixer achieves a remarkable inference speedup of 16.67 times compared to the autoregressive decoding algorithm.
arXiv Detail & Related papers (2024-10-11T10:17:02Z)
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning. LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors. We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z)
Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs) The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation. We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z)
How Far Can We Go with Practical Function-Level Program Repair? [11.71750828464698]
This paper investigates the effect of few-shot learning mechanism and the auxiliary repair-relevant information on function-level APR. We propose an LLM-based function-level APR technique, namely SRepair, which adopts a dual-LLM framework to leverage the power of the auxiliary repair-relevant information.
arXiv Detail & Related papers (2024-04-19T12:14:09Z)
Multi-Objective Fine-Tuning for Enhanced Program Repair with LLMs [16.890411067079885]
Large language models (LLMs) have demonstrated remarkable capabilities on a broad spectrum of downstream tasks. We propose a novel perspective on the learning focus of LLM fine-tuning for program repair. We apply MORepair to fine-tune four open-source LLMs with different sizes and architectures.
arXiv Detail & Related papers (2024-04-19T05:36:21Z)
Aligning LLMs for FL-free Program Repair [14.935596175148586]
This paper investigates a new approach to adapt large language models (LLMs) to program repair. Our core insight is that LLM's APR capability can be greatly improved by simply aligning the output to their training objective. Based on this insight, we designed D4C, a straightforward prompting framework for APR.
arXiv Detail & Related papers (2024-04-13T02:36:40Z)
Improving the Validity of Automatically Generated Feedback via Reinforcement Learning [50.067342343957876]
We propose a framework for feedback generation that optimize both correctness and alignment using reinforcement learning (RL) Specifically, we use GPT-4's annotations to create preferences over feedback pairs in an augmented dataset for training via direct preference optimization (DPO)
arXiv Detail & Related papers (2024-03-02T20:25:50Z)
Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation [128.01050030936028]
We propose an information refinement training method named InFO-RAG. InFO-RAG is low-cost and general across various tasks. It improves the performance of LLaMA2 by an average of 9.39% relative points.
arXiv Detail & Related papers (2024-02-28T08:24:38Z)
Rethinking the Roles of Large Language Models in Chinese Grammatical Error Correction [62.409807640887834]
Chinese Grammatical Error Correction (CGEC) aims to correct all potential grammatical errors in the input sentences. LLMs' performance as correctors on CGEC remains unsatisfactory due to its challenging task focus. We rethink the roles of LLMs in the CGEC task so that they can be better utilized and explored in CGEC.
arXiv Detail & Related papers (2024-02-18T01:40:34Z)
A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models [50.86686630756207]
Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back. Current generative models for Automatic Program Repair (APR) are pre-trained on source code and fine-tuned for repair. This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back.
arXiv Detail & Related papers (2024-01-15T22:36:31Z)
The Right Prompts for the Job: Repair Code-Review Defects with Large Language Model [15.885824575879763]
Automatic program repair (APR) techniques have the potential to reduce manual efforts in uncovering and repairing program defects during the code review (CR) process. However, the limited accuracy and considerable time costs associated with existing APR approaches hinder their adoption in industrial practice. Recent advancements in Large Language Models (LLMs) have enhanced their ability to comprehend natural and programming languages, enabling them to generate patches based on review comments.
arXiv Detail & Related papers (2023-12-29T06:12:15Z)
Editing Large Language Models: Problems, Methods, and Opportunities [51.903537096207]
This paper embarks on a deep exploration of the problems, methods, and opportunities related to model editing for LLMs. We provide an exhaustive overview of the task definition and challenges associated with model editing, along with an in-depth empirical analysis of the most progressive methods currently at our disposal. Our objective is to provide valuable insights into the effectiveness and feasibility of each editing technique, thereby assisting the community in making informed decisions on the selection of the most appropriate method for a specific task or context.
arXiv Detail & Related papers (2023-05-22T16:00:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.