AI-assisted coding: Experiments with GPT-4
- URL: http://arxiv.org/abs/2304.13187v1
- Date: Tue, 25 Apr 2023 22:59:01 GMT
- Title: AI-assisted coding: Experiments with GPT-4
- Authors: Russell A Poldrack, Thomas Lu, and Ga\v{s}per Begu\v{s}
- Abstract summary: GPT-4 can generate tests with substantial coverage, but that many of the tests fail applied to the associated code.
These findings suggest that while AI coding tools are very powerful, they still require humans in the loop to ensure validity and accuracy of the results.
- Score: 0.22366638308792727
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Artificial intelligence (AI) tools based on large language models have
acheived human-level performance on some computer programming tasks. We report
several experiments using GPT-4 to generate computer code. These experiments
demonstrate that AI code generation using the current generation of tools,
while powerful, requires substantial human validation to ensure accurate
performance. We also demonstrate that GPT-4 refactoring of existing code can
significantly improve that code along several established metrics for code
quality, and we show that GPT-4 can generate tests with substantial coverage,
but that many of the tests fail when applied to the associated code. These
findings suggest that while AI coding tools are very powerful, they still
require humans in the loop to ensure validity and accuracy of the results.
Related papers
- Impact of the Availability of ChatGPT on Software Development: A Synthetic Difference in Differences Estimation using GitHub Data [49.1574468325115]
ChatGPT is an AI tool that enhances software production efficiency.
We estimate ChatGPT's effects on the number of git pushes, repositories, and unique developers per 100,000 people.
These results suggest that AI tools like ChatGPT can substantially boost developer productivity, though further analysis is needed to address potential downsides such as low quality code and privacy concerns.
arXiv Detail & Related papers (2024-06-16T19:11:15Z) - CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code [59.32609948217718]
We present CodeIP, a new watermarking technique for Large Language Models (LLMs)-based code generation.
CodeIP enables the insertion of multi-bit information while preserving the semantics of the generated code.
arXiv Detail & Related papers (2024-04-24T04:25:04Z) - Whodunit: Classifying Code as Human Authored or GPT-4 Generated -- A
case study on CodeChef problems [0.13124513975412253]
We use code stylometry and machine learning to distinguish between GPT-4 generated and human-authored code.
Our dataset comprises human-authored solutions from CodeChef and AI-authored solutions generated by GPT-4.
Our study shows that code stylometry is a promising approach for distinguishing between GPT-4 generated code and human-authored code.
arXiv Detail & Related papers (2024-03-06T19:51:26Z) - A systematic evaluation of large language models for generating
programming code [0.0]
GPT-4 substantially outperforms other large language models, including Gemini Ultra and Claude 2.
In most LeetCode and GeeksforGeeks coding contests evaluated in this study, GPT-4 employing the optimal prompt strategy outperforms 85 percent of human participants.
arXiv Detail & Related papers (2024-03-01T14:43:06Z) - Test-Driven Development for Code Generation [0.850206009406913]
Large Language Models (LLMs) have demonstrated significant capabilities in generating code snippets directly from problem statements.
This paper investigates if and how Test-Driven Development (TDD) can be incorporated into AI-assisted code-generation processes.
arXiv Detail & Related papers (2024-02-21T04:10:12Z) - OpenAi's GPT4 as coding assistant [0.0]
GPT4 is considered the most potent Large Language Model from Openai.
In this paper, we examine GPT3.5 and GPT4 as coding assistants.
arXiv Detail & Related papers (2023-09-22T09:31:39Z) - A LLM Assisted Exploitation of AI-Guardian [57.572998144258705]
We evaluate the robustness of AI-Guardian, a recent defense to adversarial examples published at IEEE S&P 2023.
We write none of the code to attack this model, and instead prompt GPT-4 to implement all attack algorithms following our instructions and guidance.
This process was surprisingly effective and efficient, with the language model at times producing code from ambiguous instructions faster than the author of this paper could have done.
arXiv Detail & Related papers (2023-07-20T17:33:25Z) - Aligning Offline Metrics and Human Judgments of Value for Code
Generation Models [25.726216146776054]
We show that while correctness captures high-value generations, programmers still rate code that fails unit tests as valuable if it reduces the overall effort needed to complete a coding task.
We propose a hybrid metric that combines functional correctness and syntactic similarity and show that it achieves a 14% stronger correlation with value.
arXiv Detail & Related papers (2022-10-29T05:03:28Z) - CodeT: Code Generation with Generated Tests [49.622590050797236]
We explore the use of pre-trained language models to automatically generate test cases.
CodeT executes the code solutions using the generated test cases, and then chooses the best solution.
We evaluate CodeT on five different pre-trained models with both HumanEval and MBPP benchmarks.
arXiv Detail & Related papers (2022-07-21T10:18:37Z) - Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it.
Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z) - Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation.
Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges.
Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.