Examining the Use and Impact of an AI Code Assistant on Developer Productivity and Experience in the Enterprise
- URL: http://arxiv.org/abs/2412.06603v2
- Date: Tue, 04 Mar 2025 18:59:26 GMT
- Title: Examining the Use and Impact of an AI Code Assistant on Developer Productivity and Experience in the Enterprise
- Authors: Justin D. Weisz, Shraddha Kumar, Michael Muller, Karen-Ellen Browne, Arielle Goldberg, Ellice Heintze, Shagun Bajpai,
- Abstract summary: watsonx Code Assistant (WCA) deployed internally within IBM.<n>We examined developers' experiences with WCA and its impact on their productivity.
- Score: 16.33294877731443
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: AI assistants are being created to help software engineers conduct a variety of coding-related tasks, such as writing, documenting, and testing code. We describe the use of the watsonx Code Assistant (WCA), an LLM-powered coding assistant deployed internally within IBM. Through surveys of two user cohorts (N=669) and unmoderated usability testing (N=15), we examined developers' experiences with WCA and its impact on their productivity. We learned about their motivations for using (or not using) WCA, we examined their expectations of its speed and quality, and we identified new considerations regarding ownership of and responsibility for generated code. Our case study characterizes the impact of an LLM-powered assistant on developers' perceptions of productivity and it shows that although such tools do often provide net productivity increases, these benefits may not always be experienced by all users.
Related papers
- Usage, Effects and Requirements for AI Coding Assistants in the Enterprise: An Empirical Study [4.01226690413941]
We surveyed 57 developers about their experience with AI coding assistants and CodeLLMs.<n>We reviewed 35 user surveys on the usage, experience and expectations of professionals and students using AI coding assistants and CodeLLMs.
arXiv Detail & Related papers (2026-01-27T22:57:20Z) - "My productivity is boosted, but ..." Demystifying Users' Perception on AI Coding Assistants [13.118506949442564]
We identify 1,085 AI coding assistants from the Visual Studio Code Marketplace.<n>We then manually analyze the user reviews sampled from 32 AI coding assistants that have sufficient installations and reviews to construct a comprehensive taxonomy of user concerns and feedback about these assistants.<n>We propose five practical implications and suggestions to guide the enhancement of AI coding assistants that satisfy user needs.
arXiv Detail & Related papers (2025-08-17T08:22:47Z) - The SPACE of AI: Real-World Lessons on AI's Impact on Developers [0.807084206814932]
We study how developers perceive AI's influence across the dimensions of the SPACE framework: Satisfaction, Performance, Activity, Collaboration and Efficiency.<n>We find that AI is broadly adopted and widely seen as enhancing productivity, particularly for routine tasks.<n>Developers report increased efficiency and satisfaction, with less evidence of impact on collaboration.
arXiv Detail & Related papers (2025-07-31T21:45:54Z) - Do AI models help produce verified bug fixes? [62.985237003585674]
Large Language Models are used to produce corrections to software bugs.<n>This paper investigates how programmers use Large Language Models to complement their own skills.<n>The results are a first step towards a proper role for AI and LLMs in providing guaranteed-correct fixes to program bugs.
arXiv Detail & Related papers (2025-07-21T17:30:16Z) - Code with Me or for Me? How Increasing AI Automation Transforms Developer Workflows [66.1850490474361]
We conduct the first academic study to explore developer interactions with coding agents.<n>We evaluate two leading copilot and agentic coding assistants, GitHub Copilot and OpenHands.<n>Our results show agents have the potential to assist developers in ways that surpass copilots.
arXiv Detail & Related papers (2025-07-10T20:12:54Z) - Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z) - Towards Decoding Developer Cognition in the Age of AI Assistants [9.887133861477233]
We propose a controlled observational study combining physiological measurements (EEG and eye tracking) with interaction data to examine developers' use of AI-assisted programming tools.
We will recruit professional developers to complete programming tasks both with and without AI assistance while measuring their cognitive load and task completion time.
arXiv Detail & Related papers (2025-01-05T23:25:21Z) - Dear Diary: A randomized controlled trial of Generative AI coding tools in the workplace [2.5280615594444567]
Generative AI coding tools are relatively new, and their impact on developers extends beyond traditional coding metrics.
This study aims to illuminate developers' preexisting beliefs about generative AI tools, their self perceptions, and how regular use of these tools may alter these beliefs.
Our findings reveal that the introduction and sustained use of generative AI coding tools significantly increases developers' perceptions of these tools as both useful and enjoyable.
arXiv Detail & Related papers (2024-10-24T00:07:27Z) - Learning to Ask: When LLMs Meet Unclear Instruction [49.256630152684764]
Large language models (LLMs) can leverage external tools for addressing a range of tasks unattainable through language skills alone.
We evaluate the performance of LLMs tool-use under imperfect instructions, analyze the error patterns, and build a challenging tool-use benchmark called Noisy ToolBench.
We propose a novel framework, Ask-when-Needed (AwN), which prompts LLMs to ask questions to users whenever they encounter obstacles due to unclear instructions.
arXiv Detail & Related papers (2024-08-31T23:06:12Z) - Does Co-Development with AI Assistants Lead to More Maintainable Code? A Registered Report [6.7428644467224]
This study aims to examine the influence of AI assistants on software maintainability.
In Phase 1, developers will add a new feature to a Java project, with or without the aid of an AI assistant.
In Phase 2, a randomized controlled trial, will involve a different set of developers evolving random Phase 1 projects - working without AI assistants.
arXiv Detail & Related papers (2024-08-20T11:48:42Z) - WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks [85.95607119635102]
Large language models (LLMs) can mimic human-like intelligence.
WorkArena++ is designed to evaluate the planning, problem-solving, logical/arithmetic reasoning, retrieval, and contextual understanding abilities of web agents.
arXiv Detail & Related papers (2024-07-07T07:15:49Z) - Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs)
The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation.
We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z) - How far are AI-powered programming assistants from meeting developers' needs? [17.77734978425295]
In-IDE AI coding assistant tools (ACATs) like GitHub Copilot have significantly impacted developers' coding habits.
We simulate real development scenarios and recruit 27 computer science students to investigate their behavior with three popular ACATs.
We find that ACATs generally enhance task completion rates, reduce time, improve code quality, and increase self-perceived productivity.
arXiv Detail & Related papers (2024-04-18T08:51:14Z) - WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks? [83.19032025950986]
We study the use of large language model-based agents for interacting with software via web browsers.
WorkArena is a benchmark of 33 tasks based on the widely-used ServiceNow platform.
BrowserGym is an environment for the design and evaluation of such agents.
arXiv Detail & Related papers (2024-03-12T14:58:45Z) - Experiential Co-Learning of Software-Developing Agents [83.34027623428096]
Large language models (LLMs) have brought significant changes to various domains, especially in software development.
We introduce Experiential Co-Learning, a novel LLM-agent learning framework.
Experiments demonstrate that the framework enables agents to tackle unseen software-developing tasks more effectively.
arXiv Detail & Related papers (2023-12-28T13:50:42Z) - How to Teach Programming in the AI Era? Using LLMs as a Teachable Agent for Debugging [28.321080454393687]
Large Language Models (LLMs) now excel at generative skills and can create content at impeccable speeds.
Human novices play the role of Teaching Assistants and help LLM-powered teachable agents code.
We introduce Hypo, a novel system to facilitate deliberate practice on debug, where human novices play the role of Teaching Assistants and help LLM-powered teachable agents code.
arXiv Detail & Related papers (2023-10-08T21:39:47Z) - LLM-based Interaction for Content Generation: A Case Study on the
Perception of Employees in an IT department [85.1523466539595]
This paper presents a questionnaire survey to identify the intention to use generative tools by employees of an IT company.
Our results indicate a rather average acceptability of generative tools, although the more useful the tool is perceived to be, the higher the intention seems to be.
Our analyses suggest that the frequency of use of generative tools is likely to be a key factor in understanding how employees perceive these tools in the context of their work.
arXiv Detail & Related papers (2023-04-18T15:35:43Z) - Supporting Qualitative Analysis with Large Language Models: Combining
Codebook with GPT-3 for Deductive Coding [45.5690960017762]
This study explores the use of large language models (LLMs) in supporting deductive coding.
Instead of training task-specific models, a pre-trained LLM could be used directly for various tasks without fine-tuning through prompt learning.
Using a curiosity-driven questions coding task as a case study, we found, by combining GPT-3 with expert-drafted codebooks, our proposed approach achieved fair to substantial agreements with expert-coded results.
arXiv Detail & Related papers (2023-04-17T04:52:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.