Related papers: How far are AI-powered programming assistants from meeting developers' needs?

How far are AI-powered programming assistants from meeting developers' needs?

URL: http://arxiv.org/abs/2404.12000v2
Date: Wed, 24 Apr 2024 13:16:36 GMT
Title: How far are AI-powered programming assistants from meeting developers' needs?
Authors: Xin Tan, Xiao Long, Xianjun Ni, Yinghao Zhu, Jing Jiang, Li Zhang,
Abstract summary: In-IDE AI coding assistant tools (ACATs) like GitHub Copilot have significantly impacted developers' coding habits. We simulate real development scenarios and recruit 27 computer science students to investigate their behavior with three popular ACATs. We find that ACATs generally enhance task completion rates, reduce time, improve code quality, and increase self-perceived productivity.
Score: 17.77734978425295
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent In-IDE AI coding assistant tools (ACATs) like GitHub Copilot have significantly impacted developers' coding habits. While some studies have examined their effectiveness, there lacks in-depth investigation into the actual assistance process. To bridge this gap, we simulate real development scenarios encompassing three typical types of software development tasks and recruit 27 computer science students to investigate their behavior with three popular ACATs. Our goal is to comprehensively assess ACATs' effectiveness, explore characteristics of recommended code, identify reasons for modifications, and understand users' challenges and expectations. To facilitate the study, we develop an experimental platform that includes a data collection plugin for VSCode IDE and provides functions for screen recording, code evaluation, and automatic generation of personalized interview and survey questions. Through analysis of the collected data, we find that ACATs generally enhance task completion rates, reduce time, improve code quality, and increase self-perceived productivity. However, the improvement is influenced by both the nature of coding tasks and users' experience level. Notably, for experienced participants, the use of ACATs may even increase completion time. We observe that "edited line completion" is the most frequently recommended way, while "comments completion" and "string completion" have the lowest acceptance rates. The primary reasons for modifying recommended code are disparities between output formats and requirements, flawed logic, and inconsistent code styles. In terms of challenges and expectations, optimization of service access and help documentation is also concerned by participants except for functionality and performance. Our study provides valuable insights into the effectiveness and usability of ACATs, informing further improvements in their design and implementation.

Related papers

Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection [71.92083784393418]
Inference-time methods such as Best-of-N (BON) sampling offer a simple yet effective alternative to improve performance. We propose Iterative Agent Decoding (IAD) which combines iterative refinement with dynamic candidate evaluation and selection guided by a verifier.
arXiv Detail & Related papers (2025-04-02T17:40:47Z)
Improving FIM Code Completions via Context & Curriculum Based Learning [6.779631208983878]
We develop a curriculum dataset by extracting hard-to-complete patterns from code repositories. We generate context examples using semantic and static analysis tools. We validate our approach through online A/B testing, demonstrating tangible improvements in Completion Acceptance Rate (CAR) and Completion Persistence (CPR)
arXiv Detail & Related papers (2024-12-21T11:30:54Z)
Prompting and Fine-tuning Large Language Models for Automated Code Review Comment Generation [5.6001617185032595]
Large language models pretrained on both programming and natural language data tend to perform well in code-oriented tasks. We fine-tune open-source Large language models (LLM) in parameter-efficient, quantized low-rank fashion on consumer-grade hardware to improve review comment generation.
arXiv Detail & Related papers (2024-11-15T12:01:38Z)
Which Combination of Test Metrics Can Predict Success of a Software Project? A Case Study in a Year-Long Project Course [1.553083901660282]
Testing plays an important role in securing the success of a software development project. We investigate whether we can quantify the effects various types of testing have on functional suitability.
arXiv Detail & Related papers (2024-08-22T04:23:51Z)
Code Compass: A Study on the Challenges of Navigating Unfamiliar Codebases [2.808331566391181]
We propose a novel tool, Code, to address these issues. Our study highlights a significant gap in current tools and methodologies. Our formative study demonstrates how effectively the tool reduces the time developers spend navigating documentation.
arXiv Detail & Related papers (2024-05-10T06:58:31Z)
Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions [54.55334589363247]
We study whether conveying information about uncertainty enables programmers to more quickly and accurately produce code. We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits.
arXiv Detail & Related papers (2023-02-14T18:43:34Z)
Chatbots As Fluent Polyglots: Revisiting Breakthrough Code Snippets [0.0]
The research applies AI-driven code assistants to analyze a selection of influential computer code that has shaped modern technology. The original contribution of this study was to examine half of the most significant code advances in the last 50 years.
arXiv Detail & Related papers (2023-01-05T23:17:17Z)
Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models [108.13378788663196]
We propose Subspace Prompt Tuning (SubPT) to project the gradients in back-propagation onto the low-rank subspace spanned by the early-stage gradient flow eigenvectors during the entire training process. We equip CoOp with Novel Learner Feature (NFL) to enhance the generalization ability of the learned prompts onto novel categories beyond the training set.
arXiv Detail & Related papers (2022-11-04T02:06:22Z)
All You Need Is Logs: Improving Code Completion by Learning from Anonymous IDE Usage Logs [55.606644084003094]
We propose an approach for collecting completion usage logs from the users in an IDE. We use them to train a machine learning based model for ranking completion candidates. Our evaluation shows that using a simple ranking model trained on the past user behavior logs significantly improved code completion experience.
arXiv Detail & Related papers (2022-05-21T23:21:26Z)
CodeReviewer: Pre-Training for Automating Code Review Activities [36.40557768557425]
This research focuses on utilizing pre-training techniques for the tasks in the code review scenario. We collect a large-scale dataset of real world code changes and code reviews from open-source projects in nine of the most popular programming languages. To better understand code diffs and reviews, we propose CodeReviewer, a pre-trained model that utilizes four pre-training tasks tailored specifically for the code review senario.
arXiv Detail & Related papers (2022-03-17T05:40:13Z)
ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval. We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z)
ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification. A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors. Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z)
How Useful is Self-Supervised Pretraining for Visual Tasks? [133.1984299177874]
We evaluate various self-supervised algorithms across a comprehensive array of synthetic datasets and downstream tasks. Our experiments offer insights into how the utility of self-supervision changes as the number of available labels grows.
arXiv Detail & Related papers (2020-03-31T16:03:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.