Teaching Code Refactoring Using LLMs
- URL: http://arxiv.org/abs/2508.09332v1
- Date: Tue, 12 Aug 2025 20:41:19 GMT
- Title: Teaching Code Refactoring Using LLMs
- Authors: Anshul Khairnar, Aarya Rajoju, Edward F. Gehringer,
- Abstract summary: Large Language Models (LLMs) can enhance the teaching of code in software engineering courses through real-time, context-aware feedback.<n>Refactoring improves code quality but is difficult to teach, especially with complex, real-worlds.
- Score: 0.7407754140732635
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This Innovative Practice full paper explores how Large Language Models (LLMs) can enhance the teaching of code refactoring in software engineering courses through real-time, context-aware feedback. Refactoring improves code quality but is difficult to teach, especially with complex, real-world codebases. Traditional methods like code reviews and static analysis tools offer limited, inconsistent feedback. Our approach integrates LLM-assisted refactoring into a course project using structured prompts to help students identify and address code smells such as long methods and low cohesion. Implemented in Spring 2025 in a long-lived OSS project, the intervention is evaluated through student feedback and planned analysis of code quality improvements. Findings suggest that LLMs can bridge theoretical and practical learning, supporting a deeper understanding of maintainability and refactoring principles.
Related papers
- From Restructuring to Stabilization: A Large-Scale Experiment on Iterative Code Readability Refactoring with Large Language Models [5.31828955342405]
Large language models (LLMs) are increasingly used for automated code tasks.<n>This article systematically study the capabilities of LLMs for code readability.
arXiv Detail & Related papers (2026-02-25T12:05:25Z) - From Human to Machine Refactoring: Assessing GPT-4's Impact on Python Class Quality and Readability [46.83143241367452]
Refactoring aims to improve code quality without altering program behavior.<n>Recent advances in Large Language Models (LLMs) have introduced new opportunities for automated code preservation.<n>We present an empirical study on LLM-driven classes using GPT-4o, applied to 100 Python classes from the ClassEval benchmark.<n>Our findings show that GPT-4o generally produces behavior-preservings that reduce code smells and improve quality metrics, albeit at the cost of decreased readability.
arXiv Detail & Related papers (2026-01-19T15:22:37Z) - CodeSimpleQA: Scaling Factuality in Code Large Language Models [55.705748501461294]
We present CodeSimpleQA, a comprehensive benchmark designed to evaluate the factual accuracy of code LLMs in answering code-related questions.<n>We also create CodeSimpleQA-Instruct, a large-scale instruction corpus with 66M samples, and develop a post-training framework combining supervised fine-tuning and reinforcement learning.
arXiv Detail & Related papers (2025-12-22T14:27:17Z) - From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence [150.3696990310269]
Large language models (LLMs) have transformed automated software development by enabling direct translation of natural language descriptions into functional code.<n>We provide a comprehensive synthesis and practical guide (a series of analytic and probing experiments) about code LLMs.<n>We analyze the code capability of the general LLMs (GPT-4, Claude, LLaMA) and code-specialized LLMs (StarCoder, Code LLaMA, DeepSeek-Coder, and QwenCoder)
arXiv Detail & Related papers (2025-11-23T17:09:34Z) - Refactoring with LLMs: Bridging Human Expertise and Machine Understanding [5.2993089947181735]
We draw on Martin Fowler's guidelines to design instruction strategies for 61 well-known transformation types.<n>We evaluate these strategies on benchmark examples and real-world code snippets from GitHub projects.<n>While descriptive instructions are more interpretable to humans, our results show that rule-based instructions often lead to better performance in specific scenarios.
arXiv Detail & Related papers (2025-10-04T19:40:42Z) - Large Language Model Unlearning for Source Code [65.42425213605114]
PROD is a novel unlearning approach that enables LLMs to forget undesired code content while preserving their code generation capabilities.<n>Our evaluation demonstrates that PROD achieves superior balance between forget quality and model utility compared to existing unlearning approaches.
arXiv Detail & Related papers (2025-06-20T16:27:59Z) - Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z) - Post-Incorporating Code Structural Knowledge into LLMs via In-Context Learning for Code Translation [10.77747590700758]
Large language models (LLMs) have achieved significant advancements in software mining.<n> handling the syntactic structure of source code remains a challenge.<n>This paper employs incontext learning (ICL) to integrate code structural knowledge into pre-trained LLMs.
arXiv Detail & Related papers (2025-03-28T10:59:42Z) - Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs [53.00384299879513]
In large language models (LLMs), code and reasoning reinforce each other.<n>Code provides verifiable execution paths, enforces logical decomposition, and enables runtime validation.<n>We identify key challenges and propose future research directions to strengthen this synergy.
arXiv Detail & Related papers (2025-02-26T18:55:42Z) - A Survey on Evaluating Large Language Models in Code Generation Tasks [30.256255254277914]
This paper provides a comprehensive review of the current methods and metrics used to evaluate the performance of Large Language Models (LLMs) in code generation tasks.<n>With the rapid growth in demand for automated software development, LLMs have demonstrated significant potential in the field of code generation.
arXiv Detail & Related papers (2024-08-29T12:56:06Z) - What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [92.62952504133926]
This study evaluated the performance of three leading closed-source LLMs and six popular open-source LLMs on three commonly used benchmarks.<n>We developed a taxonomy of bugs for incorrect codes and analyzed the root cause for common bug types.<n>We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code.
arXiv Detail & Related papers (2024-07-08T17:27:17Z) - Refactoring Deep Learning Code: A Study of Practices and Unsatisfied Tool Needs [10.440289439181756]
Deep learning software has become progressively complex as the software evolves.<n>The insight of code in the context of deep learning is still unclear.<n>Research and the development of related tools are crucial for improving project maintainability and code quality.
arXiv Detail & Related papers (2024-05-08T07:35:14Z) - AI-powered Code Review with LLMs: Early Results [10.37036924997437]
We present a novel approach to improving software quality and efficiency through a Large Language Model (LLM)-based model.
Our proposed LLM-based AI agent model is trained on large code repositories.
It aims to detect code smells, identify potential bugs, provide suggestions for improvement, and optimize the code.
arXiv Detail & Related papers (2024-04-29T08:27:50Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.