Generation Probabilities Are Not Enough: Exploring the Effectiveness of
Uncertainty Highlighting in AI-Powered Code Completions
- URL: http://arxiv.org/abs/2302.07248v1
- Date: Tue, 14 Feb 2023 18:43:34 GMT
- Title: Generation Probabilities Are Not Enough: Exploring the Effectiveness of
Uncertainty Highlighting in AI-Powered Code Completions
- Authors: Helena Vasconcelos, Gagan Bansal, Adam Fourney, Q. Vera Liao, and
Jennifer Wortman Vaughan
- Abstract summary: We study whether conveying information about uncertainty enables programmers to more quickly and accurately produce code.
We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits.
- Score: 40.961506036644444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale generative models enabled the development of AI-powered code
completion tools to assist programmers in writing code. However, much like
other AI-powered tools, AI-powered code completions are not always accurate,
potentially introducing bugs or even security vulnerabilities into code if not
properly detected and corrected by a human programmer. One technique that has
been proposed and implemented to help programmers identify potential errors is
to highlight uncertain tokens. However, there have been no empirical studies
exploring the effectiveness of this technique-- nor investigating the different
and not-yet-agreed-upon notions of uncertainty in the context of generative
models. We explore the question of whether conveying information about
uncertainty enables programmers to more quickly and accurately produce code
when collaborating with an AI-powered code completion tool, and if so, what
measure of uncertainty best fits programmers' needs. Through a mixed-methods
study with 30 programmers, we compare three conditions: providing the AI
system's code completion alone, highlighting tokens with the lowest likelihood
of being generated by the underlying generative model, and highlighting tokens
with the highest predicted likelihood of being edited by a programmer. We find
that highlighting tokens with the highest predicted likelihood of being edited
leads to faster task completion and more targeted edits, and is subjectively
preferred by study participants. In contrast, highlighting tokens according to
their probability of being generated does not provide any benefit over the
baseline with no highlighting. We further explore the design space of how to
convey uncertainty in AI-powered code completion tools, and find that
programmers prefer highlights that are granular, informative, interpretable,
and not overwhelming.
Related papers
- CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code [59.32609948217718]
We present CodeIP, a new watermarking technique for Large Language Models (LLMs)-based code generation.
CodeIP enables the insertion of multi-bit information while preserving the semantics of the generated code.
arXiv Detail & Related papers (2024-04-24T04:25:04Z) - DeVAIC: A Tool for Security Assessment of AI-generated Code [5.383910843560784]
DeVAIC (Detection of Vulnerabilities in AI-generated Code) is a tool to evaluate the security of AI-generated Python code.
arXiv Detail & Related papers (2024-04-11T08:27:23Z) - Genetic Auto-prompt Learning for Pre-trained Code Intelligence Language Models [54.58108387797138]
We investigate the effectiveness of prompt learning in code intelligence tasks.
Existing automatic prompt design methods are very limited to code intelligence tasks.
We propose Genetic Auto Prompt (GenAP) which utilizes an elaborate genetic algorithm to automatically design prompts.
arXiv Detail & Related papers (2024-03-20T13:37:00Z) - Students' Perspective on AI Code Completion: Benefits and Challenges [2.936007114555107]
We investigated the benefits, challenges, and expectations of AI code completion from students' perspectives.
Our findings show that AI code completion enhanced students' productivity and efficiency by providing correct syntax suggestions.
In the future, AI code completion should be explainable and provide best coding practices to enhance the education process.
arXiv Detail & Related papers (2023-10-31T22:41:16Z) - PrAIoritize: Automated Early Prediction and Prioritization of Vulnerabilities in Smart Contracts [1.081463830315253]
Smart contracts are prone to numerous security threats due to undisclosed vulnerabilities and code weaknesses.
Efficient prioritization is crucial for smart contract security.
Our research aims to provide an automated approach, PrAIoritize, for prioritizing and predicting critical code weaknesses.
arXiv Detail & Related papers (2023-08-21T23:30:39Z) - Robots That Ask For Help: Uncertainty Alignment for Large Language Model
Planners [85.03486419424647]
KnowNo is a framework for measuring and aligning the uncertainty of large language models.
KnowNo builds on the theory of conformal prediction to provide statistical guarantees on task completion.
arXiv Detail & Related papers (2023-07-04T21:25:12Z) - CodeLMSec Benchmark: Systematically Evaluating and Finding Security
Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks.
Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities.
This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z) - Chatbots As Fluent Polyglots: Revisiting Breakthrough Code Snippets [0.0]
The research applies AI-driven code assistants to analyze a selection of influential computer code that has shaped modern technology.
The original contribution of this study was to examine half of the most significant code advances in the last 50 years.
arXiv Detail & Related papers (2023-01-05T23:17:17Z) - Aligning Offline Metrics and Human Judgments of Value for Code
Generation Models [25.726216146776054]
We show that while correctness captures high-value generations, programmers still rate code that fails unit tests as valuable if it reduces the overall effort needed to complete a coding task.
We propose a hybrid metric that combines functional correctness and syntactic similarity and show that it achieves a 14% stronger correlation with value.
arXiv Detail & Related papers (2022-10-29T05:03:28Z) - Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it.
Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.