Evaluating GPT's Programming Capability through CodeWars' Katas
- URL: http://arxiv.org/abs/2306.01784v1
- Date: Wed, 31 May 2023 10:36:16 GMT
- Title: Evaluating GPT's Programming Capability through CodeWars' Katas
- Authors: Zizhuo Zhang, Lian Wen, Shaoyang Zhang, David Chen, Yanfei Jiang
- Abstract summary: This paper presents a novel evaluation of the programming proficiency of Generative Pretrained Transformer (GPT) models.
The experiments reveal a distinct boundary at the 3kyu level, beyond which these GPT models struggle to provide solutions.
The research emphasizes the need for validation and creative thinking capabilities in AI models to better emulate human problem-solving techniques.
- Score: 0.5512295869673147
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the burgeoning field of artificial intelligence (AI), understanding the
capabilities and limitations of programming-oriented models is crucial. This
paper presents a novel evaluation of the programming proficiency of Generative
Pretrained Transformer (GPT) models, specifically GPT-3.5 and GPT-4, against
coding problems of varying difficulty levels drawn from Codewars. The
experiments reveal a distinct boundary at the 3kyu level, beyond which these
GPT models struggle to provide solutions. These findings led to the proposal of
a measure for coding problem complexity that incorporates both problem
difficulty and the time required for solution. The research emphasizes the need
for validation and creative thinking capabilities in AI models to better
emulate human problem-solving techniques. Future work aims to refine this
proposed complexity measure, enhance AI models with these suggested
capabilities, and develop an objective measure for programming problem
difficulty. The results of this research offer invaluable insights for
improving AI programming capabilities and advancing the frontier of AI
problem-solving abilities.
Related papers
- Estimating Difficulty Levels of Programming Problems with Pre-trained Model [18.92661958433282]
The difficulty level of each programming problem serves as an essential reference for guiding students' adaptive learning.
We formulate the problem of automatic difficulty level estimation of each programming problem, given its textual description and a solution example of code.
For tackling this problem, we propose to couple two pre-trained models, one for text modality and the other for code modality, into a unified model.
arXiv Detail & Related papers (2024-06-13T05:38:20Z) - The Role of Code Proficiency in the Era of Generative AI [10.524937623398003]
Generative AI models are becoming integral to the developer workspace.
However, challenges emerge due to the 'black box' nature of many of these models.
This position paper advocates for a 'white box' approach to these generative models.
arXiv Detail & Related papers (2024-04-08T06:20:42Z) - Genetic Auto-prompt Learning for Pre-trained Code Intelligence Language Models [54.58108387797138]
We investigate the effectiveness of prompt learning in code intelligence tasks.
Existing automatic prompt design methods are very limited to code intelligence tasks.
We propose Genetic Auto Prompt (GenAP) which utilizes an elaborate genetic algorithm to automatically design prompts.
arXiv Detail & Related papers (2024-03-20T13:37:00Z) - On the Challenges and Opportunities in Generative AI [135.2754367149689]
We argue that current large-scale generative AI models do not sufficiently address several fundamental issues that hinder their widespread adoption across domains.
In this work, we aim to identify key unresolved challenges in modern generative AI paradigms that should be tackled to further enhance their capabilities, versatility, and reliability.
arXiv Detail & Related papers (2024-02-28T15:19:33Z) - Comparing Software Developers with ChatGPT: An Empirical Investigation [0.0]
This paper conducts an empirical investigation, contrasting the performance of software engineers and AI systems, like ChatGPT, across different evaluation metrics.
The paper posits that a comprehensive comparison of software engineers and AI-based solutions, considering various evaluation criteria, is pivotal in fostering human-machine collaboration.
arXiv Detail & Related papers (2023-05-19T17:25:54Z) - Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions [54.55334589363247]
We study whether conveying information about uncertainty enables programmers to more quickly and accurately produce code.
We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits.
arXiv Detail & Related papers (2023-02-14T18:43:34Z) - The Role of AI in Drug Discovery: Challenges, Opportunities, and
Strategies [97.5153823429076]
The benefits, challenges and drawbacks of AI in this field are reviewed.
The use of data augmentation, explainable AI, and the integration of AI with traditional experimental methods are also discussed.
arXiv Detail & Related papers (2022-12-08T23:23:39Z) - Competition-Level Code Generation with AlphaCode [74.87216298566942]
We introduce AlphaCode, a system for code generation that can create novel solutions to problems that require deeper reasoning.
In simulated evaluations on recent programming competitions on the Codeforces platform, AlphaCode achieved on average a ranking of top 54.3%.
arXiv Detail & Related papers (2022-02-08T23:16:31Z) - Data-Driven and SE-assisted AI Model Signal-Awareness Enhancement and
Introspection [61.571331422347875]
We propose a data-driven approach to enhance models' signal-awareness.
We combine the SE concept of code complexity with the AI technique of curriculum learning.
We achieve up to 4.8x improvement in model signal awareness.
arXiv Detail & Related papers (2021-11-10T17:58:18Z) - Explainable AI for Software Engineering [12.552048647904591]
We first highlight the need for explainable AI in software engineering.
Then, we summarize three successful case studies on how explainable AI techniques can be used to address the aforementioned challenges.
arXiv Detail & Related papers (2020-12-03T00:42:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.