Related papers: Evaluating GPT's Programming Capability through CodeWars' Katas

Evaluating GPT's Programming Capability through CodeWars' Katas

URL: http://arxiv.org/abs/2306.01784v1
Date: Wed, 31 May 2023 10:36:16 GMT
Title: Evaluating GPT's Programming Capability through CodeWars' Katas
Authors: Zizhuo Zhang, Lian Wen, Shaoyang Zhang, David Chen, Yanfei Jiang
Abstract summary: This paper presents a novel evaluation of the programming proficiency of Generative Pretrained Transformer (GPT) models. The experiments reveal a distinct boundary at the 3kyu level, beyond which these GPT models struggle to provide solutions. The research emphasizes the need for validation and creative thinking capabilities in AI models to better emulate human problem-solving techniques.
Score: 0.5512295869673147
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the burgeoning field of artificial intelligence (AI), understanding the capabilities and limitations of programming-oriented models is crucial. This paper presents a novel evaluation of the programming proficiency of Generative Pretrained Transformer (GPT) models, specifically GPT-3.5 and GPT-4, against coding problems of varying difficulty levels drawn from Codewars. The experiments reveal a distinct boundary at the 3kyu level, beyond which these GPT models struggle to provide solutions. These findings led to the proposal of a measure for coding problem complexity that incorporates both problem difficulty and the time required for solution. The research emphasizes the need for validation and creative thinking capabilities in AI models to better emulate human problem-solving techniques. Future work aims to refine this proposed complexity measure, enhance AI models with these suggested capabilities, and develop an objective measure for programming problem difficulty. The results of this research offer invaluable insights for improving AI programming capabilities and advancing the frontier of AI problem-solving abilities.

Related papers

The AI Imperative: Scaling High-Quality Peer Review in Machine Learning [49.87236114682497]
We argue that AI-assisted peer review must become an urgent research and infrastructure priority.<n>We propose specific roles for AI in enhancing factual verification, guiding reviewer performance, assisting authors in quality improvement, and supporting ACs in decision-making.
arXiv Detail & Related papers (2025-06-09T18:37:14Z)
Verbal Process Supervision Elicits Better Coding Agents [0.9558392439655016]
This work introduces CURA, a code understanding and reasoning agent system enhanced with verbal process supervision (VPS) CURA achieves a 3.65% improvement over baseline models on challenging benchmarks like BigCodeBench.
arXiv Detail & Related papers (2025-03-24T09:48:59Z)
A Showdown of ChatGPT vs DeepSeek in Solving Programming Tasks [2.66269503676104]
This study evaluates two leading models: ChatGPT 03-mini and DeepSeek-R1 on their ability to solve competitive programming tasks from Codeforces. Our results indicate that while both models perform similarly on easy tasks, ChatGPT outperforms DeepSeek-R1 on medium-difficulty tasks.
arXiv Detail & Related papers (2025-03-16T14:35:36Z)
MLGym: A New Framework and Benchmark for Advancing AI Research Agents [51.9387884953294]
We introduce Meta MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing large language models on AI research tasks. This is the first Gym environment for machine learning (ML) tasks, enabling research on reinforcement learning (RL) algorithms for training such agents. We evaluate a number of frontier large language models (LLMs) on our benchmarks such as Claude-3.5-Sonnet, Llama-3.1 405B, GPT-4o, o1-preview, and Gemini-1.5 Pro.
arXiv Detail & Related papers (2025-02-20T12:28:23Z)
"Give me the code" -- Log Analysis of First-Year CS Students' Interactions With GPT [0.0]
This paper analyzes the prompts used by 69 freshmen undergraduate students to solve a certain programming problem within a project assignment. Despite using unsophisticated prompting techniques, our findings suggest that the majority of students successfully leveraged GPT. Half of the students demonstrated the ability to exercise judgment in selecting from multiple GPT-generated solutions.
arXiv Detail & Related papers (2024-11-26T20:11:46Z)
Estimating Difficulty Levels of Programming Problems with Pre-trained Model [18.92661958433282]
The difficulty level of each programming problem serves as an essential reference for guiding students' adaptive learning. We formulate the problem of automatic difficulty level estimation of each programming problem, given its textual description and a solution example of code. For tackling this problem, we propose to couple two pre-trained models, one for text modality and the other for code modality, into a unified model.
arXiv Detail & Related papers (2024-06-13T05:38:20Z)
The Role of Code Proficiency in the Era of Generative AI [10.524937623398003]
Generative AI models are becoming integral to the developer workspace. However, challenges emerge due to the 'black box' nature of many of these models. This position paper advocates for a 'white box' approach to these generative models.
arXiv Detail & Related papers (2024-04-08T06:20:42Z)
Genetic Auto-prompt Learning for Pre-trained Code Intelligence Language Models [54.58108387797138]
We investigate the effectiveness of prompt learning in code intelligence tasks. Existing automatic prompt design methods are very limited to code intelligence tasks. We propose Genetic Auto Prompt (GenAP) which utilizes an elaborate genetic algorithm to automatically design prompts.
arXiv Detail & Related papers (2024-03-20T13:37:00Z)
On the Challenges and Opportunities in Generative AI [135.2754367149689]
We argue that current large-scale generative AI models do not sufficiently address several fundamental issues that hinder their widespread adoption across domains. In this work, we aim to identify key unresolved challenges in modern generative AI paradigms that should be tackled to further enhance their capabilities, versatility, and reliability.
arXiv Detail & Related papers (2024-02-28T15:19:33Z)
Comparing Software Developers with ChatGPT: An Empirical Investigation [0.0]
This paper conducts an empirical investigation, contrasting the performance of software engineers and AI systems, like ChatGPT, across different evaluation metrics. The paper posits that a comprehensive comparison of software engineers and AI-based solutions, considering various evaluation criteria, is pivotal in fostering human-machine collaboration.
arXiv Detail & Related papers (2023-05-19T17:25:54Z)
Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions [54.55334589363247]
We study whether conveying information about uncertainty enables programmers to more quickly and accurately produce code. We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits.
arXiv Detail & Related papers (2023-02-14T18:43:34Z)
The Role of AI in Drug Discovery: Challenges, Opportunities, and Strategies [97.5153823429076]
The benefits, challenges and drawbacks of AI in this field are reviewed. The use of data augmentation, explainable AI, and the integration of AI with traditional experimental methods are also discussed.
arXiv Detail & Related papers (2022-12-08T23:23:39Z)
Competition-Level Code Generation with AlphaCode [74.87216298566942]
We introduce AlphaCode, a system for code generation that can create novel solutions to problems that require deeper reasoning. In simulated evaluations on recent programming competitions on the Codeforces platform, AlphaCode achieved on average a ranking of top 54.3%.
arXiv Detail & Related papers (2022-02-08T23:16:31Z)
Data-Driven and SE-assisted AI Model Signal-Awareness Enhancement and Introspection [61.571331422347875]
We propose a data-driven approach to enhance models' signal-awareness. We combine the SE concept of code complexity with the AI technique of curriculum learning. We achieve up to 4.8x improvement in model signal awareness.
arXiv Detail & Related papers (2021-11-10T17:58:18Z)
Explainable AI for Software Engineering [12.552048647904591]
We first highlight the need for explainable AI in software engineering. Then, we summarize three successful case studies on how explainable AI techniques can be used to address the aforementioned challenges.
arXiv Detail & Related papers (2020-12-03T00:42:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.