AI-assisted coding: Experiments with GPT-4
- URL: http://arxiv.org/abs/2304.13187v1
- Date: Tue, 25 Apr 2023 22:59:01 GMT
- Title: AI-assisted coding: Experiments with GPT-4
- Authors: Russell A Poldrack, Thomas Lu, and Ga\v{s}per Begu\v{s}
- Abstract summary: GPT-4 can generate tests with substantial coverage, but that many of the tests fail applied to the associated code.
These findings suggest that while AI coding tools are very powerful, they still require humans in the loop to ensure validity and accuracy of the results.
- Score: 0.22366638308792727
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Artificial intelligence (AI) tools based on large language models have
acheived human-level performance on some computer programming tasks. We report
several experiments using GPT-4 to generate computer code. These experiments
demonstrate that AI code generation using the current generation of tools,
while powerful, requires substantial human validation to ensure accurate
performance. We also demonstrate that GPT-4 refactoring of existing code can
significantly improve that code along several established metrics for code
quality, and we show that GPT-4 can generate tests with substantial coverage,
but that many of the tests fail when applied to the associated code. These
findings suggest that while AI coding tools are very powerful, they still
require humans in the loop to ensure validity and accuracy of the results.
Related papers
- An Empirical Study on Automatically Detecting AI-Generated Source Code: How Far Are We? [8.0988059417354]
We propose a range of approaches to improve the performance of AI-generated code detection.
Our best model outperforms state-of-the-art AI-generated code detector (GPTSniffer) and achieves an F1 score of 82.55.
arXiv Detail & Related papers (2024-11-06T22:48:18Z) - Disrupting Test Development with AI Assistants [1.024113475677323]
Generative AI-assisted coding tools like GitHub Copilot, ChatGPT, and Tabnine have significantly transformed software development.
This paper analyzes how these innovations impact productivity and software test development metrics.
arXiv Detail & Related papers (2024-11-04T17:52:40Z) - Impact of the Availability of ChatGPT on Software Development: A Synthetic Difference in Differences Estimation using GitHub Data [49.1574468325115]
ChatGPT is an AI tool that enhances software production efficiency.
We estimate ChatGPT's effects on the number of git pushes, repositories, and unique developers per 100,000 people.
These results suggest that AI tools like ChatGPT can substantially boost developer productivity, though further analysis is needed to address potential downsides such as low quality code and privacy concerns.
arXiv Detail & Related papers (2024-06-16T19:11:15Z) - CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code [56.019447113206006]
Large Language Models (LLMs) have achieved remarkable progress in code generation.
CodeIP is a novel multi-bit watermarking technique that embeds additional information to preserve provenance details.
Experiments conducted on a real-world dataset across five programming languages demonstrate the effectiveness of CodeIP.
arXiv Detail & Related papers (2024-04-24T04:25:04Z) - Whodunit: Classifying Code as Human Authored or GPT-4 Generated -- A
case study on CodeChef problems [0.13124513975412253]
We use code stylometry and machine learning to distinguish between GPT-4 generated and human-authored code.
Our dataset comprises human-authored solutions from CodeChef and AI-authored solutions generated by GPT-4.
Our study shows that code stylometry is a promising approach for distinguishing between GPT-4 generated code and human-authored code.
arXiv Detail & Related papers (2024-03-06T19:51:26Z) - OpenAi's GPT4 as coding assistant [0.0]
GPT4 is considered the most potent Large Language Model from Openai.
In this paper, we examine GPT3.5 and GPT4 as coding assistants.
arXiv Detail & Related papers (2023-09-22T09:31:39Z) - Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions [54.55334589363247]
We study whether conveying information about uncertainty enables programmers to more quickly and accurately produce code.
We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits.
arXiv Detail & Related papers (2023-02-14T18:43:34Z) - Aligning Offline Metrics and Human Judgments of Value for Code
Generation Models [25.726216146776054]
We show that while correctness captures high-value generations, programmers still rate code that fails unit tests as valuable if it reduces the overall effort needed to complete a coding task.
We propose a hybrid metric that combines functional correctness and syntactic similarity and show that it achieves a 14% stronger correlation with value.
arXiv Detail & Related papers (2022-10-29T05:03:28Z) - CodeT: Code Generation with Generated Tests [49.622590050797236]
We explore the use of pre-trained language models to automatically generate test cases.
CodeT executes the code solutions using the generated test cases, and then chooses the best solution.
We evaluate CodeT on five different pre-trained models with both HumanEval and MBPP benchmarks.
arXiv Detail & Related papers (2022-07-21T10:18:37Z) - Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it.
Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z) - Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation.
Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges.
Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.