CodeCoT: Tackling Code Syntax Errors in CoT Reasoning for Code
Generation
- URL: http://arxiv.org/abs/2308.08784v2
- Date: Fri, 23 Feb 2024 04:56:37 GMT
- Title: CodeCoT: Tackling Code Syntax Errors in CoT Reasoning for Code
Generation
- Authors: Dong Huang, Qingwen Bu, Yuhao Qing, Heming Cui
- Abstract summary: Chain-of-thought (CoT) has emerged as a groundbreaking tool in NLP, notably for its efficacy in complex reasoning tasks.
We present Code Chain-of-Thought (CodeCoT) that integrates CoT with a self-examination process for code generation.
- Score: 6.139760107605468
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Chain-of-thought (CoT) has emerged as a groundbreaking tool in NLP, notably
for its efficacy in complex reasoning tasks, such as mathematical proofs.
However, its application in code generation faces a distinct challenge, i.e.,
although the code generated with CoT reasoning is logically correct, it faces
the problem of syntax error (e.g., invalid syntax error report) during code
execution, which causes the CoT result's pass@1 in HumanEval even lower than
the zero-shot result.
In this paper, we present Code Chain-of-Thought (CodeCoT) that integrates CoT
with a self-examination process for code generation. CodeCoT begins with the
LLMs using CoT for initial code development to ensure the generated code
follows the correct logic flow. Then, CodeCoT will generate test cases to
validate whether the code has syntax errors during the execution. CodeCoT then
employs a self-examination phase, in which the generated code is executed
against these test cases in the local environment. If the local environment
raises error information (e.g., invalid syntax error), CodeCoT will iteratively
refine the code based on the feedback information. Within this loop, CodeCoT
can make sure their generated codes not only follow the logic flow of the code
description, but the syntax error will also be addressed with the
self-examination process. Our evaluation results reveal that CodeCoT improves
the effectiveness of code generation. For example, CodeCoT increases pass@1
from 75.6% to 79.3% for the HumanEval dataset.
Related papers
- CodeCoR: An LLM-Based Self-Reflective Multi-Agent Framework for Code Generation [10.048098631259876]
Code generation aims to produce code that fulfills requirements written in natural languages automatically.
Large language Models (LLMs) like ChatGPT fail to ensure the syntactic and semantic correctness of the generated code.
We propose CodeCoR, a self-reflective multi-agent framework that evaluates the effectiveness of each agent and their collaborations.
arXiv Detail & Related papers (2025-01-14T03:21:10Z) - Tree-of-Code: A Tree-Structured Exploring Framework for End-to-End Code Generation and Execution in Complex Task Handling [4.597983734278579]
Tree-of-Code boosts accuracy by nearly 20% over CodeAct with less than 1/4 turns.
Several LLMs even perform better on one-turn CodeProgram than on multi-turn CodeAct.
arXiv Detail & Related papers (2024-12-19T12:31:22Z) - Contextualized Data-Wrangling Code Generation in Computational Notebooks [131.26365849822932]
We propose an automated approach, CoCoMine, to mine data-wrangling code generation examples with clear multi-modal contextual dependency.
We construct CoCoNote, a dataset containing 58,221 examples for Contextualized Data-wrangling Code generation in Notebooks.
Experiment results demonstrate the significance of incorporating data context in data-wrangling code generation.
arXiv Detail & Related papers (2024-09-20T14:49:51Z) - CodeSift: An LLM-Based Reference-Less Framework for Automatic Code Validation [3.22798929957223]
Large language models (LLMs) have greatly facilitated code generation, but ensuring the functional correctness of generated code remains a challenge.
Traditional validation methods are often time-consuming, error-prone, and impractical for large volumes of code.
We introduce CodeSift, a novel framework that leverages LLMs as the first-line filter of code validation without the need for execution, reference code, or human feedback.
arXiv Detail & Related papers (2024-08-28T08:32:21Z) - Code Documentation and Analysis to Secure Software Development [0.0]
CoDAT is a tool designed to maintain consistency between the various levels of code documentation.
It is implemented in the Intellij IDEA.
We use a large language model to check the semantic consistency between a fragment of code and the comments that describe it.
arXiv Detail & Related papers (2024-07-16T17:25:44Z) - Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback [29.136378191436396]
We present CoCoGen, a new code generation approach that uses compiler feedback to improve the LLM-generated code.
CoCoGen first leverages static analysis to identify mismatches between the generated code and the project's context.
It then iteratively aligns and fixes the identified errors using information extracted from the code repository.
arXiv Detail & Related papers (2024-03-25T14:07:27Z) - INTERVENOR: Prompting the Coding Ability of Large Language Models with the Interactive Chain of Repair [42.5403218101046]
INTERVENOR is a system designed to emulate the interactive code repair processes observed in humans.
LLMs play distinct roles during the code repair process, functioning as both a Code Learner and a Code Teacher.
arXiv Detail & Related papers (2023-11-16T12:55:20Z) - COCO: Testing Code Generation Systems via Concretized Instructions [33.13427092832396]
COCO is a technique to test the robustness of code generation systems.
It exploits the usage scenario of code generation systems to make the original programming instruction more concrete.
We evaluated COCO on eight advanced code generation systems, including commercial tools such as Copilot and ChatGPT.
arXiv Detail & Related papers (2023-08-25T11:49:27Z) - InterCode: Standardizing and Benchmarking Interactive Coding with
Execution Feedback [50.725076393314964]
We introduce InterCode, a lightweight, flexible, and easy-to-use framework of interactive coding as a standard reinforcement learning environment.
Our framework is language and platform agnostic, uses self-contained Docker environments to provide safe and reproducible execution.
We demonstrate InterCode's viability as a testbed by evaluating multiple state-of-the-art LLMs configured with different prompting strategies.
arXiv Detail & Related papers (2023-06-26T17:59:50Z) - Code Execution with Pre-trained Language Models [88.04688617516827]
Most pre-trained models for code intelligence ignore the execution trace and only rely on source code and syntactic structures.
We develop a mutation-based data augmentation technique to create a large-scale and realistic Python dataset and task for code execution.
We then present CodeExecutor, a Transformer model that leverages code execution pre-training and curriculum learning to enhance its semantic comprehension.
arXiv Detail & Related papers (2023-05-08T10:00:05Z) - Soft-Labeled Contrastive Pre-training for Function-level Code
Representation [127.71430696347174]
We present textbfSCodeR, a textbfSoft-labeled contrastive pre-training framework with two positive sample construction methods.
Considering the relevance between codes in a large-scale code corpus, the soft-labeled contrastive pre-training can obtain fine-grained soft-labels.
SCodeR achieves new state-of-the-art performance on four code-related tasks over seven datasets.
arXiv Detail & Related papers (2022-10-18T05:17:37Z) - CodeT: Code Generation with Generated Tests [49.622590050797236]
We explore the use of pre-trained language models to automatically generate test cases.
CodeT executes the code solutions using the generated test cases, and then chooses the best solution.
We evaluate CodeT on five different pre-trained models with both HumanEval and MBPP benchmarks.
arXiv Detail & Related papers (2022-07-21T10:18:37Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.