Large Language Model Guided Self-Debugging Code Generation
- URL: http://arxiv.org/abs/2502.02928v1
- Date: Wed, 05 Feb 2025 06:43:40 GMT
- Title: Large Language Model Guided Self-Debugging Code Generation
- Authors: Muntasir Adnan, Zhiwei Xu, Carlos C. N. Kuhn,
- Abstract summary: PyCapsule is a novel framework for Python code generation.<n>It features prompt inference, iterative error handling, and case testing.<n>It achieves up to 5.7% improvement of success rate on HumanEval, 10.3% on HumanEval-ET, and 24.4% on BigCodeBench.
- Score: 2.816120626533879
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automated code generation is gaining significant importance in intelligent computer programming and system deployment. However, current approaches often face challenges in computational efficiency and lack robust mechanisms for code parsing and error correction. In this work, we propose a novel framework, PyCapsule, with a simple yet effective two-agent pipeline and efficient self-debugging modules for Python code generation. PyCapsule features sophisticated prompt inference, iterative error handling, and case testing, ensuring high generation stability, safety, and correctness. Empirically, PyCapsule achieves up to 5.7% improvement of success rate on HumanEval, 10.3% on HumanEval-ET, and 24.4% on BigCodeBench compared to the state-of-art methods. We also observe a decrease in normalized success rate given more self-debugging attempts, potentially affected by limited and noisy error feedback in retention. PyCapsule demonstrates broader impacts on advancing lightweight and efficient code generation for artificial intelligence systems.
Related papers
- MalCodeAI: Autonomous Vulnerability Detection and Remediation via Language Agnostic Code Reasoning [0.0]
MalCodeAI is a language-agnostic pipeline for autonomous code security analysis and remediation.<n>It combines code decomposition and semantic reasoning using finetuned Qwen2.5-Coder-3B-Instruct models.<n>MalCodeAI supports red-hat-style exploit tracing, CVSS-based risk scoring, and zero-shot generalization to detect complex, zero-day vulnerabilities.
arXiv Detail & Related papers (2025-07-15T01:25:04Z) - A Fast, Reliable, and Secure Programming Language for LLM Agents with Code Actions [28.01600045250939]
We propose a programming language for code actions called Quasar.<n>LLMs can write code in a subset of Python, which is automatically transpiled to Quasar.<n>LLMs with Quasar actions retain strong performance, while reducing execution time when possible by 42%.
arXiv Detail & Related papers (2025-06-13T20:11:22Z) - Pychop: Emulating Low-Precision Arithmetic in Numerical Methods and Neural Networks [0.0]
Low-precision arithmetic has revolutionized deep learning by enabling more efficient computation and reduced memory and energy consumption.
We develop the Pychop library, which supports customizable floating-point formats and a comprehensive set of rounding modes in Python.
In this paper, we offer a comprehensive exposition of the design, implementation, validation, and practical application of Pychop.
arXiv Detail & Related papers (2025-04-10T15:12:29Z) - Learning to Solve and Verify: A Self-Play Framework for Code and Test Generation [69.62857948698436]
Recent advances in large language models (LLMs) have improved their performance on coding benchmarks.
However, improvement is plateauing due to the exhaustion of readily available high-quality data.
We propose Sol-Ver, a self-play solver-verifier framework that jointly improves a single model's code and test generation capacity.
arXiv Detail & Related papers (2025-02-20T18:32:19Z) - Intelligent Green Efficiency for Intrusion Detection [0.0]
This paper presents an assessment of different programming languages and Feature Selection (FS) methods to improve performance of AI.
Experiments were conducted using five ML models - Random Forest, XGBoost, LightGBM, Multi-Layer Perceptron, and Long Short-Term Memory.
Results demonstrated that FS plays an important role enhancing the computational efficiency of AI models without compromising detection accuracy.
arXiv Detail & Related papers (2024-11-11T15:01:55Z) - Low-Cost Language Models: Survey and Performance Evaluation on Python Code Generation [0.0]
Large Language Models (LLMs) have become a popular choice for many Natural Language Processing (NLP) tasks.
LLMs' substantial computational and memory requirements often make them inaccessible to users with limited resources.
This paper focuses on very low-cost models which offer a more accessible alternative to resource-intensive LLMs.
arXiv Detail & Related papers (2024-04-17T08:16:48Z) - DeVAIC: A Tool for Security Assessment of AI-generated Code [5.383910843560784]
DeVAIC (Detection of Vulnerabilities in AI-generated Code) is a tool to evaluate the security of AI-generated Python code.
arXiv Detail & Related papers (2024-04-11T08:27:23Z) - Camouflage is all you need: Evaluating and Enhancing Language Model
Robustness Against Camouflage Adversarial Attacks [53.87300498478744]
Adversarial attacks represent a substantial challenge in Natural Language Processing (NLP)
This study undertakes a systematic exploration of this challenge in two distinct phases: vulnerability evaluation and resilience enhancement.
Results suggest a trade-off between performance and robustness, with some models maintaining similar performance while gaining robustness.
arXiv Detail & Related papers (2024-02-15T10:58:22Z) - Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised
Language Understanding [38.11411155621616]
We study self-training as one of the predominant semi-supervised learning approaches.
We present UPET, a novel Uncertainty-aware self-Training framework.
We show that UPET achieves a substantial improvement in terms of performance and efficiency.
arXiv Detail & Related papers (2023-10-19T02:18:29Z) - Dynamic Transformers Provide a False Sense of Efficiency [75.39702559746533]
Multi-exit models make a trade-off between efficiency and accuracy, where the saving of computation comes from an early exit.
We propose a simple yet effective attacking framework, SAME, which is specially tailored to reduce the efficiency of the multi-exit models.
Experiments on the GLUE benchmark show that SAME can effectively diminish the efficiency gain of various multi-exit models by 80% on average.
arXiv Detail & Related papers (2023-05-20T16:41:48Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z) - NAPG: Non-Autoregressive Program Generation for Hybrid Tabular-Textual
Question Answering [52.10214317661547]
Current numerical reasoning methods autoregressively decode program sequences.
The accuracy of program generation drops sharply as the decoding steps unfold due to error propagation.
In this paper, we propose a non-autoregressive program generation framework.
arXiv Detail & Related papers (2022-11-07T11:25:21Z) - Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it.
Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z) - Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation.
Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges.
Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.