Leveraging Print Debugging to Improve Code Generation in Large Language
Models
- URL: http://arxiv.org/abs/2401.05319v1
- Date: Wed, 10 Jan 2024 18:37:59 GMT
- Title: Leveraging Print Debugging to Improve Code Generation in Large Language
Models
- Authors: Xueyu Hu, Kun Kuang, Jiankai Sun, Hongxia Yang, Fei Wu
- Abstract summary: Large language models (LLMs) have made significant progress in code generation tasks.
But their performance in tackling programming problems with complex data structures and algorithms remains suboptimal.
We propose an in-context learning approach that guides LLMs to debug by using a "print debug" method.
- Score: 63.63160583432348
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) have made significant progress in code
generation tasks, but their performance in tackling programming problems with
complex data structures and algorithms remains suboptimal. To address this
issue, we propose an in-context learning approach that guides LLMs to debug by
using a "print debugging" method, which involves inserting print statements to
trace and analysing logs for fixing the bug. We collect a Leetcode problem
dataset and evaluate our method using the Leetcode online judging system.
Experiments with GPT-4 demonstrate the effectiveness of our approach,
outperforming rubber duck debugging in easy and medium-level Leetcode problems
by 1.5% and 17.9%.
Related papers
- Enhancing Input-Label Mapping in In-Context Learning with Contrastive Decoding [71.01099784480597]
Large language models (LLMs) excel at a range of tasks through in-context learning (ICL)
We introduce In-Context Contrastive Decoding (ICCD), a novel method that emphasizes input-label mapping.
ICCD emphasizes input-label mapping by contrasting the output distributions between positive and negative in-context examples.
arXiv Detail & Related papers (2025-02-19T14:04:46Z) - Can OpenSource beat ChatGPT? -- A Comparative Study of Large Language Models for Text-to-Code Generation [0.24578723416255752]
We evaluate five different large language models (LLMs) concerning their capabilities for text-to-code generation.
ChatGPT can handle these typical programming challenges by far the most effectively, surpassing even code-specialized models like Code Llama.
arXiv Detail & Related papers (2024-09-06T10:03:49Z) - COAST: Enhancing the Code Debugging Ability of LLMs through Communicative Agent Based Data Synthesis [29.667170755786508]
We introduce EVAL, a benchmark for evaluating the abilities of Large Language Models.
We propose the COmmunicative Agent-based data SynThesis framework, which employs a multi-agent system to generate high-quality training data.
Results demonstrate that COAST-generated data outperform human-curated and GPT-4-generated data.
arXiv Detail & Related papers (2024-08-09T11:35:44Z) - What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions.
We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types.
We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z) - StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z) - DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs)
It covers four major bug categories and 18 minor types in C++, Java, and Python.
We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z) - LLM4TDD: Best Practices for Test Driven Development Using Large Language
Models [0.76146285961466]
This paper explores the concept of LLM4TDD, where we guide Large Language Models to generate code iteratively using a test-driven development methodology.
We conduct an empirical evaluation using ChatGPT and coding problems from LeetCode to investigate the impact of different test, prompt and problem attributes on the efficacy of LLM4TDD.
arXiv Detail & Related papers (2023-12-07T20:37:54Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.