Related papers: Comprehension-Performance Gap in GenAI-Assisted Brownfield Programming: A Replication and Extension

Comprehension-Performance Gap in GenAI-Assisted Brownfield Programming: A Replication and Extension

URL: http://arxiv.org/abs/2511.02922v1
Date: Tue, 04 Nov 2025 19:03:55 GMT
Title: Comprehension-Performance Gap in GenAI-Assisted Brownfield Programming: A Replication and Extension
Authors: Yunhan Qiao, Christopher Hundhausen, Summit Haque, Md Istiak Hossain Shihab,
Abstract summary: Code comprehension is essential for brownfield programming tasks.<n>Generative AI (GenAI) coding assistants such as GitHub Copilot have been shown to improve developer productivity.<n>We explore both performance and comprehension in GenAI-assisted brownfield programming tasks.
Score: 0.41998444721319217
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Code comprehension is essential for brownfield programming tasks, in which developers maintain and enhance legacy code bases. Generative AI (GenAI) coding assistants such as GitHub Copilot have been shown to improve developer productivity, but their impact on code understanding is less clear. We replicate and extend a previous study by exploring both performance and comprehension in GenAI-assisted brownfield programming tasks. In a within-subjects experimental study, 18 computer science graduate students completed feature implementation tasks with and without Copilot. Results show that Copilot significantly reduced task time and increased the number of test cases passed. However, comprehension scores did not differ across conditions, revealing a comprehension-performance gap: participants passed more test cases with Copilot, but did not demonstrate greater understanding of the legacy codebase. Moreover, we failed to find a correlation between comprehension and task performance. These findings suggest that while GenAI tools can accelerate programming progress in a legacy codebase, such progress may come without an improved understanding of that codebase. We consider the implications of these findings for programming education and GenAI tool design.

Related papers

Alignment with Fill-In-the-Middle for Enhancing Code Generation [56.791415642365415]
We propose a novel approach that splits code snippets into smaller, granular blocks, creating more diverse DPO pairs from the same test cases.<n>Our approach demonstrates significant improvements in code generation tasks, as validated by experiments on benchmark datasets such as HumanEval (+), MBPP (+), APPS, LiveCodeBench, and BigCodeBench.
arXiv Detail & Related papers (2025-08-27T03:15:53Z)
Code with Me or for Me? How Increasing AI Automation Transforms Developer Workflows [60.04362496037186]
We present the first controlled study of developer interactions with coding agents.<n>We evaluate two leading copilot and agentic coding assistants.<n>Our results show agents can assist developers in ways that surpass copilots.
arXiv Detail & Related papers (2025-07-10T20:12:54Z)
The Effects of GitHub Copilot on Computing Students' Programming Effectiveness, Efficiency, and Processes in Brownfield Programming Tasks [0.6282171844772422]
GitHub Copilot is a generative artificial intelligence (GenAI) coding assistant.<n>This paper investigates how GitHub Copilot influences undergraduate students' programming performance, behaviors, and understanding.
arXiv Detail & Related papers (2025-06-11T16:18:53Z)
From Developer Pairs to AI Copilots: A Comparative Study on Knowledge Transfer [8.567835367628787]
With the rise of AI coding assistants, developers now not only work with human partners but also, as some claim, with AI pair programmers.<n>To analyze knowledge transfer in both human-human and human-AI settings, we conducted an empirical study.<n>We found a similar frequency of successful knowledge transfer episodes and overlapping topical categories across both settings.
arXiv Detail & Related papers (2025-06-05T09:13:30Z)
Benchmarking ChatGPT, Codeium, and GitHub Copilot: A Comparative Study of AI-Driven Programming and Debugging Assistants [0.0]
Large language models (LLMs) have become essential for tasks like code generation, bug fixing, and optimization. This paper presents a comparative study of ChatGPT, Codeium, and GitHub Copilot, evaluating their performance on LeetCode problems.
arXiv Detail & Related papers (2024-09-30T03:53:40Z)
The Widening Gap: The Benefits and Harms of Generative AI for Novice Programmers [1.995977018536036]
Novice programmers often struggle through programming problem solving due to a lack of metacognitive awareness and strategies. Many novices are now programming with generative AI (GenAI) Our findings show an unfortunate divide in the use of GenAI tools between students who accelerated and students who struggled.
arXiv Detail & Related papers (2024-05-28T01:48:28Z)
Genetic Auto-prompt Learning for Pre-trained Code Intelligence Language Models [54.58108387797138]
We investigate the effectiveness of prompt learning in code intelligence tasks. Existing automatic prompt design methods are very limited to code intelligence tasks. We propose Genetic Auto Prompt (GenAP) which utilizes an elaborate genetic algorithm to automatically design prompts.
arXiv Detail & Related papers (2024-03-20T13:37:00Z)
CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks. We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning. In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z)
Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions [54.55334589363247]
We study whether conveying information about uncertainty enables programmers to more quickly and accurately produce code. We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits.
arXiv Detail & Related papers (2023-02-14T18:43:34Z)
Choose Your Programming Copilot: A Comparison of the Program Synthesis Performance of GitHub Copilot and Genetic Programming [2.2559617939136505]
GitHub Copilot is an extension for the Visual Studio Code development environment powered by the large-scale language model Codex. In this paper, we evaluate GitHub Copilot on standard program synthesis benchmark problems and compare the achieved results with those from the genetic programming literature. We find that the performance of the two approaches on the benchmark problems is quite similar, however, in comparison to GitHub Copilot, the program synthesis approaches based on genetic programming are not yet mature enough.
arXiv Detail & Related papers (2021-11-15T16:30:12Z)
Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation. Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges. Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.