Related papers: How do Copilot Suggestions Impact Developers' Frustration and Productivity?

How do Copilot Suggestions Impact Developers' Frustration and Productivity?

URL: http://arxiv.org/abs/2504.06808v1
Date: Wed, 09 Apr 2025 11:55:22 GMT
Title: How do Copilot Suggestions Impact Developers' Frustration and Productivity?
Authors: Emanuela Guglielmi, Venera Arnoudova, Gabriele Bavota, Rocco Oliveto, Simone Scalabrino,
Abstract summary: We propose two theories on the impact of automatic suggestions on frustration and productivity.<n>We will involve at least 32 developers, both experts and novices.
Score: 12.302518927205103
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Context. AI-based development tools, such as GitHub Copilot, are transforming the software development process by offering real-time code suggestions. These tools promise to improve the productivity by reducing cognitive load and speeding up task completion. Previous exploratory studies, however, show that developers sometimes perceive the automatic suggestions as intrusive. As a result, they feel like their productivity decreased. Theory. We propose two theories on the impact of automatic suggestions on frustration and productivity. First, we hypothesize that experienced developers are frustrated from automatic suggestions (mostly from irrelevant ones), and this also negatively impacts their productivity. Second, we conjecture that novice developers benefit from automatic suggestions, which reduce the frustration caused from being stuck on a technical problem and thus increase their productivity. Objective. We plan to conduct a quasi-experimental study to test our theories. The empirical evidence we will collect will allow us to either corroborate or reject our theories. Method. We will involve at least 32 developers, both experts and novices. We will ask each of them to complete two software development tasks, one with automatic suggestions enabled and one with them disabled, allowing for within-subject comparisons. We will measure independent and dependent variables by monitoring developers' actions through an IDE plugin and screen recording. Besides, we will collect physiological data through a wearable device. We will use statistical hypothesis tests to study the effects of the treatments (i.e., automatic suggestions enabled/disabled) on the outcomes (frustration and productivity).

Related papers

"Maybe We Need Some More Examples:" Individual and Team Drivers of Developer GenAI Tool Use [7.989553944867326]
Despite widespread availability of generative AI tools, developer adoption remains uneven.<n>This unevenness hampers productivity efforts, frustrates management's expectations, and creates uncertainty around the future roles of developers.<n>Our findings imply that widespread organizational expectations for rapid productivity gains without sufficient investment in learning support creates a "Productivity Pressure Paradox"
arXiv Detail & Related papers (2025-07-28T19:05:46Z)
Do AI models help produce verified bug fixes? [62.985237003585674]
Large Language Models are used to produce corrections to software bugs.<n>This paper investigates how programmers use Large Language Models to complement their own skills.<n>The results are a first step towards a proper role for AI and LLMs in providing guaranteed-correct fixes to program bugs.
arXiv Detail & Related papers (2025-07-21T17:30:16Z)
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity [0.5789840336223054]
Despite widespread adoption, the impact of AI tools on software development in the wild remains understudied.<n>We conduct a randomized controlled trial (RCT) to understand how AI tools at the February-June 2025 frontier affect the productivity of experienced open-source developers.<n>16 developers with moderate AI experience complete 246 tasks in mature projects on which they have an average of 5 years of prior experience.
arXiv Detail & Related papers (2025-07-12T00:16:33Z)
Code with Me or for Me? How Increasing AI Automation Transforms Developer Workflows [66.1850490474361]
We conduct the first academic study to explore developer interactions with coding agents.<n>We evaluate two leading copilot and agentic coding assistants, GitHub Copilot and OpenHands.<n>Our results show agents have the potential to assist developers in ways that surpass copilots.
arXiv Detail & Related papers (2025-07-10T20:12:54Z)
From Reproduction to Replication: Evaluating Research Agents with Progressive Code Masking [48.90371827091671]
AutoExperiment is a benchmark that evaluates AI agents' ability to implement and run machine learning experiments.<n>We evaluate state-of-the-art agents and find that performance degrades rapidly as $n$ increases.<n>Our findings highlight critical challenges in long-horizon code generation, context retrieval, and autonomous experiment execution.
arXiv Detail & Related papers (2025-06-24T15:39:20Z)
Towards Decoding Developer Cognition in the Age of AI Assistants [9.887133861477233]
We propose a controlled observational study combining physiological measurements (EEG and eye tracking) with interaction data to examine developers' use of AI-assisted programming tools.<n>We will recruit professional developers to complete programming tasks both with and without AI assistance while measuring their cognitive load and task completion time.
arXiv Detail & Related papers (2025-01-05T23:25:21Z)
Automated Code Review In Practice [1.6271516689052665]
Several AI-assisted tools, such as Qodo, GitHub Copilot, and Coderabbit, provide automated reviews using large language models (LLMs) This study examines the impact of LLM-based automated code review tools in an industrial setting.
arXiv Detail & Related papers (2024-12-24T16:24:45Z)
How much does AI impact development speed? An enterprise-based randomized controlled trial [8.759453531975668]
We estimate the impact of three AI features on the time developers spent on a complex, enterprise-grade task. We also found an interesting effect whereby developers who spend more hours on code-related activities per day were faster with AI.
arXiv Detail & Related papers (2024-10-16T18:31:14Z)
Does Co-Development with AI Assistants Lead to More Maintainable Code? A Registered Report [6.7428644467224]
This study aims to examine the influence of AI assistants on software maintainability. In Phase 1, developers will add a new feature to a Java project, with or without the aid of an AI assistant. In Phase 2, a randomized controlled trial, will involve a different set of developers evolving random Phase 1 projects - working without AI assistants.
arXiv Detail & Related papers (2024-08-20T11:48:42Z)
Impact of the Availability of ChatGPT on Software Development: A Synthetic Difference in Differences Estimation using GitHub Data [49.1574468325115]
ChatGPT is an AI tool that enhances software production efficiency. We estimate ChatGPT's effects on the number of git pushes, repositories, and unique developers per 100,000 people. These results suggest that AI tools like ChatGPT can substantially boost developer productivity, though further analysis is needed to address potential downsides such as low quality code and privacy concerns.
arXiv Detail & Related papers (2024-06-16T19:11:15Z)
Rethinking People Analytics With Inverse Transparency by Design [57.67333075002697]
We propose a new design approach for workforce analytics we refer to as inverse transparency by design. We find that architectural changes are made without inhibiting core functionality. We conclude that inverse transparency by design is a promising approach to realize accepted and responsible people analytics.
arXiv Detail & Related papers (2023-05-16T21:37:35Z)
A Large-Scale Survey on the Usability of AI Programming Assistants: Successes and Challenges [23.467373994306524]
In practice, developers do not accept AI programming assistants' initial suggestions at a high frequency. To understand developers' practices while using these tools, we administered a survey to a large population of developers. We found that developers are most motivated to use AI programming assistants because they help developers reduce key-strokes, finish programming tasks quickly, and recall syntax. We also found the most important reasons why developers do not use these tools are because these tools do not output code that addresses certain functional or non-functional requirements.
arXiv Detail & Related papers (2023-03-30T03:21:53Z)
Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on. In this work, we propose MEDAL++, a novel design for self-improving robotic systems. The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z)
Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions [54.55334589363247]
We study whether conveying information about uncertainty enables programmers to more quickly and accurately produce code. We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits.
arXiv Detail & Related papers (2023-02-14T18:43:34Z)
SUPERNOVA: Automating Test Selection and Defect Prevention in AAA Video Games Using Risk Based Testing and Machine Learning [62.997667081978825]
Testing video games is an increasingly difficult task as traditional methods fail to scale with growing software systems. We present SUPERNOVA, a system responsible for test selection and defect prevention while also functioning as an automation hub. The direct impact of this has been observed to be a reduction in 55% or more testing hours for an undisclosed sports game title.
arXiv Detail & Related papers (2022-03-10T00:47:46Z)
How Useful is Self-Supervised Pretraining for Visual Tasks? [133.1984299177874]
We evaluate various self-supervised algorithms across a comprehensive array of synthetic datasets and downstream tasks. Our experiments offer insights into how the utility of self-supervision changes as the number of available labels grows.
arXiv Detail & Related papers (2020-03-31T16:03:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.