Related papers: Beyond the Commit: Developer Perspectives on Productivity with AI Coding Assistants

Beyond the Commit: Developer Perspectives on Productivity with AI Coding Assistants

URL: http://arxiv.org/abs/2602.03593v1
Date: Tue, 03 Feb 2026 14:47:30 GMT
Title: Beyond the Commit: Developer Perspectives on Productivity with AI Coding Assistants
Authors: Valerie Chen, Jasmyn He, Behnjamin Williams, Jason Valentino, Ameet Talwalkar,
Abstract summary: This study analyzes the validity of different approaches to evaluating the productivity impacts of AI coding assistants.<n>Survey results expose conflicting perspectives on AI tool usefulness, while interviews elicit six distinct factors that capture both short-term and long-term dimensions of productivity.
Score: 15.506325982937241
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Measuring developer productivity is a topic that has attracted attention from both academic research and industrial practice. In the age of AI coding assistants, it has become even more important for both academia and industry to understand how to measure their impact on developer productivity, and to reconsider whether earlier measures and frameworks still apply. This study analyzes the validity of different approaches to evaluating the productivity impacts of AI coding assistants by leveraging mixed-method research. At BNY Mellon, we conduct a survey with 2989 developer responses and 11 in-depth interviews. Our findings demonstrate that a multifaceted approach is needed to measure AI productivity impacts: survey results expose conflicting perspectives on AI tool usefulness, while interviews elicit six distinct factors that capture both short-term and long-term dimensions of productivity. In contrast to prior work, our factors highlight the importance of long-term metrics like technical expertise and ownership of work. We hope this work encourages future research to incorporate a broader range of human-centered factors, and supports industry in adopting more holistic approaches to evaluating developer productivity.

Related papers

Evolving with AI: A Longitudinal Analysis of Developer Logs [3.7353323067733473]
We study how sustained AI use reshapes actual daily coding practices in the long term.<n>We analyze five dimensions of workflow change: productivity, code quality, code editing, code reuse, and context switching.<n>Our results offer empirical insights into the silent restructuring of software and provide implications for designing future AI-augmented tooling.
arXiv Detail & Related papers (2026-01-15T10:30:24Z)
The SPACE of AI: Real-World Lessons on AI's Impact on Developers [0.807084206814932]
We study how developers perceive AI's influence across the dimensions of the SPACE framework: Satisfaction, Performance, Activity, Collaboration and Efficiency.<n>We find that AI is broadly adopted and widely seen as enhancing productivity, particularly for routine tasks.<n>Developers report increased efficiency and satisfaction, with less evidence of impact on collaboration.
arXiv Detail & Related papers (2025-07-31T21:45:54Z)
The Impact of LLM-Assistants on Software Developer Productivity: A Systematic Literature Review [4.503986781849658]
Large language model assistants (LLM-assistants) present new opportunities to transform software development.<n>Despite growing interest, there is no synthesis of how LLM-assistants affect software developer productivity.<n>Our analysis reveals that LLM-assistants offer both considerable benefits and critical risks.
arXiv Detail & Related papers (2025-07-03T20:25:49Z)
The AI Imperative: Scaling High-Quality Peer Review in Machine Learning [49.87236114682497]
We argue that AI-assisted peer review must become an urgent research and infrastructure priority.<n>We propose specific roles for AI in enhancing factual verification, guiding reviewer performance, assisting authors in quality improvement, and supporting ACs in decision-making.
arXiv Detail & Related papers (2025-06-09T18:37:14Z)
What Makes a Good Natural Language Prompt? [72.3282960118995]
We conduct a meta-analysis surveying more than 150 prompting-related papers from leading NLP and AI conferences from 2022 to 2025.<n>We propose a property- and human-centric framework for evaluating prompt quality, encompassing 21 properties categorized into six dimensions.<n>We then empirically explore multi-property prompt enhancements in reasoning tasks, observing that single-property enhancements often have the greatest impact.
arXiv Detail & Related papers (2025-06-07T23:19:27Z)
When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration [79.69935257008467]
We introduce Knowledge Integration and Transfer Evaluation (KITE), a conceptual and experimental framework for Human-AI knowledge transfer capabilities.<n>We conduct the first large-scale human study (N=118) explicitly designed to measure it.<n>In our two-phase setup, humans first ideate with an AI on problem-solving strategies, then independently implement solutions, isolating model explanations' influence on human understanding.
arXiv Detail & Related papers (2025-06-05T20:48:16Z)
Revisiting Multi-Agent Debate as Test-Time Scaling: A Systematic Study of Conditional Effectiveness [50.29739337771454]
Multi-agent debate (MAD) approaches offer improved reasoning, robustness, and diverse perspectives over monolithic models.<n>This paper conceptualizes MAD as a test-time computational scaling technique, distinguished by collaborative refinement and diverse exploration capabilities.<n>We conduct a comprehensive empirical investigation comparing MAD with strong self-agent test-time scaling baselines on mathematical reasoning and safety-related tasks.
arXiv Detail & Related papers (2025-05-29T01:02:55Z)
Evaluations at Work: Measuring the Capabilities of GenAI in Use [28.124088786766965]
Current AI benchmarks miss the messy, multi-turn nature of human-AI collaboration.<n>We present an evaluation framework that decomposes real-world tasks into interdependent subtasks.
arXiv Detail & Related papers (2025-05-15T23:06:23Z)
Towards Decoding Developer Cognition in the Age of AI Assistants [9.887133861477233]
We propose a controlled observational study combining physiological measurements (EEG and eye tracking) with interaction data to examine developers' use of AI-assisted programming tools.<n>We will recruit professional developers to complete programming tasks both with and without AI assistance while measuring their cognitive load and task completion time.
arXiv Detail & Related papers (2025-01-05T23:25:21Z)
Data Analysis in the Era of Generative AI [56.44807642944589]
This paper explores the potential of AI-powered tools to reshape data analysis, focusing on design considerations and challenges. We explore how the emergence of large language and multimodal models offers new opportunities to enhance various stages of data analysis workflow. We then examine human-centered design principles that facilitate intuitive interactions, build user trust, and streamline the AI-assisted analysis workflow across multiple apps.
arXiv Detail & Related papers (2024-09-27T06:31:03Z)
Large Multimodal Agents: A Survey [78.81459893884737]
Large language models (LLMs) have achieved superior performance in powering text-based AI agents. There is an emerging research trend focused on extending these LLM-powered AI agents into the multimodal domain. This review aims to provide valuable insights and guidelines for future research in this rapidly evolving field.
arXiv Detail & Related papers (2024-02-23T06:04:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.