OpenAi's GPT4 as coding assistant
- URL: http://arxiv.org/abs/2309.12732v1
- Date: Fri, 22 Sep 2023 09:31:39 GMT
- Title: OpenAi's GPT4 as coding assistant
- Authors: Lefteris Moussiades and George Zografos
- Abstract summary: GPT4 is considered the most potent Large Language Model from Openai.
In this paper, we examine GPT3.5 and GPT4 as coding assistants.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Lately, Large Language Models have been widely used in code generation. GPT4
is considered the most potent Large Language Model from Openai. In this paper,
we examine GPT3.5 and GPT4 as coding assistants. More specifically, we have
constructed appropriate tests to check whether the two systems can a) answer
typical questions that can arise during the code development, b) produce
reliable code, and c) contribute to code debugging. The test results are
impressive. The performance of GPT4 is outstanding and signals an increase in
the productivity of programmers and the reorganization of software development
procedures based on these new tools.
Related papers
- Comparing Human and LLM Generated Code: The Jury is Still Out! [8.456554883523472]
We compare the effectiveness of large language models (LLMs) and human programmers in producing Python software code.
We use various static analysis benchmarks, including Pylint, Radon, Bandit and test cases.
We observe security flaws in code generated by both humans and GPT-4, but GPT-4 code included more severe outliers.
arXiv Detail & Related papers (2025-01-28T11:11:36Z) - A case study on the transformative potential of AI in software engineering on LeetCode and ChatGPT [0.0]
This study employs a methodological approach, with the objective of comparing the software quality of Python programs produced by LeetCode users with that generated by GPT-4o.
The findings indicate that GPT-4o does not present a considerable impediment to code quality, understandability, or runtime when generating code on a limited scale.
arXiv Detail & Related papers (2025-01-07T09:15:25Z) - Exploring the Potential of Llama Models in Automated Code Refinement: A Replication Study [2.930521532345053]
We explore alternatives to ChatGPT in code refinement tasks by including two open-source, smaller-scale large language models: CodeLlama and Llama 2.
Our results show that, if properly tuned, the Llama models can achieve reasonable performance, often comparable to ChatGPT in automated code refinement.
Our study highlights the potential of open-source models for code refinement, offering cost-effective, privacy-conscious solutions for real-world software development.
arXiv Detail & Related papers (2024-12-03T19:39:31Z) - CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation [58.84212778960507]
We propose CodeGRAG, a Graphical Retrieval Augmented Code Generation framework to enhance the performance of LLMs.
CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to fill the gap between programming languages and natural language.
Various experiments and ablations are done on four datasets including both the C++ and python languages to validate the hard meta-graph prompt, the soft prompting technique, and the effectiveness of the objectives for pretrained GNN expert.
arXiv Detail & Related papers (2024-05-03T02:48:55Z) - Comparing large language models and human programmers for generating programming code [0.0]
GPT-4 substantially outperforms other large language models, including Gemini Ultra and Claude 2.
In most LeetCode and GeeksforGeeks coding contests evaluated in this study, GPT-4 employing the optimal prompt strategy outperforms 85 percent of human participants.
arXiv Detail & Related papers (2024-03-01T14:43:06Z) - OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement [58.034012276819425]
We introduce OpenCodeInterpreter, a family of open-source code systems for generating, executing, and iteratively refining code.
Our comprehensive evaluation of OpenCodeInterpreter across key benchmarks such as HumanEval, MBPP, and their enhanced versions from EvalPlus reveals its exceptional performance.
arXiv Detail & Related papers (2024-02-22T16:06:23Z) - Leveraging Print Debugging to Improve Code Generation in Large Language
Models [63.63160583432348]
Large language models (LLMs) have made significant progress in code generation tasks.
But their performance in tackling programming problems with complex data structures and algorithms remains suboptimal.
We propose an in-context learning approach that guides LLMs to debug by using a "print debug" method.
arXiv Detail & Related papers (2024-01-10T18:37:59Z) - CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model [58.127534002232096]
This paper introduces CodeFuse-13B, an open-sourced pre-trained code LLM.
It is specifically designed for code-related tasks with both English and Chinese prompts.
CodeFuse achieves its effectiveness by utilizing a high quality pre-training dataset.
arXiv Detail & Related papers (2023-10-10T02:38:44Z) - AI-assisted coding: Experiments with GPT-4 [0.22366638308792727]
GPT-4 can generate tests with substantial coverage, but that many of the tests fail applied to the associated code.
These findings suggest that while AI coding tools are very powerful, they still require humans in the loop to ensure validity and accuracy of the results.
arXiv Detail & Related papers (2023-04-25T22:59:01Z) - Visual Instruction Tuning [79.70923292053097]
We present the first attempt to use language-only GPT-4 to generate multimodal language-image instruction-following data.
By instruction tuning on such generated data, we introduce LLaVA: Large Language and Vision Assistant.
When fine-tuned on Science QA, the synergy of LLaVA and GPT-4 achieves a new state-of-the-art accuracy of 92.53%.
arXiv Detail & Related papers (2023-04-17T17:59:25Z) - Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation.
Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges.
Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.