Structured Chain-of-Thought Prompting for Code Generation
- URL: http://arxiv.org/abs/2305.06599v3
- Date: Thu, 7 Sep 2023 11:39:07 GMT
- Title: Structured Chain-of-Thought Prompting for Code Generation
- Authors: Jia Li, Ge Li, Yongmin Li, Zhi Jin
- Abstract summary: Chain-of-Thought (CoT) prompting is the state-of-the-art prompting technique.
We propose Structured CoTs (SCoTs) and present a novel prompting technique for code generation, named SCoT prompting.
- Score: 48.43888515848583
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) (e.g., ChatGPT) have shown impressive
performance in code generation. LLMs take prompts as inputs, and
Chain-of-Thought (CoT) prompting is the state-of-the-art prompting technique.
CoT prompting asks LLMs first to generate CoTs (i.e., intermediate natural
language reasoning steps) and then output the code. However, CoT prompting is
designed for natural language generation and has low accuracy in code
generation.
In this paper, we propose Structured CoTs (SCoTs) and present a novel
prompting technique for code generation, named SCoT prompting. Our motivation
is source code contains rich structural information and any code can be
composed of three program structures (i.e., sequence, branch, and loop
structures). Intuitively, structured intermediate reasoning steps make for
structured source code. Thus, we ask LLMs to use program structures to build
CoTs, obtaining SCoTs. Then, LLMs generate the final code based on SCoTs.
Compared to CoT prompting, SCoT prompting explicitly constrains LLMs to think
about how to solve requirements from the view of source code and further the
performance of LLMs in code generation. We apply SCoT prompting to two LLMs
(i.e., ChatGPT and Codex) and evaluate it on three benchmarks (i.e., HumanEval,
MBPP, and MBCPP). (1) SCoT prompting outperforms the state-of-the-art baseline
- CoT prompting by up to 13.79% in Pass@1. (2) Human evaluation shows human
developers prefer programs from SCoT prompting. (3) SCoT prompting is robust to
examples and achieves substantial improvements.
Related papers
- CoCoP: Enhancing Text Classification with LLM through Code Completion Prompt [3.2047924365529026]
We propose the Code Completion Prompt (CoCoP) method, which transforms the text classification problem into a code completion task.
CoCoP significantly improves text classification performance across diverse datasets by utilizing LLMs' code-completion capability.
arXiv Detail & Related papers (2024-11-13T19:12:02Z) - InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct [43.7550233177368]
We propose INVERSE-INSTRUCT, which summarizes instructions from code snippets instead of the reverse.
We present a series of code LLMs named InverseCoder, which surpasses the performance of the original code LLMs on a wide range of benchmarks.
arXiv Detail & Related papers (2024-07-08T08:00:05Z) - Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs [57.27982780697922]
Large language models have demonstrated exceptional capability in natural language understanding and generation.
However, their generation speed is limited by the inherently sequential nature of their decoding process.
This paper introduces Lexical Unit Decoding, a novel decoding methodology implemented in a data-driven manner.
arXiv Detail & Related papers (2024-05-24T04:35:13Z) - CodecLM: Aligning Language Models with Tailored Synthetic Data [51.59223474427153]
We introduce CodecLM, a framework for adaptively generating high-quality synthetic data for instruction-following abilities.
We first encode seed instructions into metadata, which are concise keywords generated on-the-fly to capture the target instruction distribution.
We also introduce Self-Rubrics and Contrastive Filtering during decoding to tailor data-efficient samples.
arXiv Detail & Related papers (2024-04-08T21:15:36Z) - Can Separators Improve Chain-of-Thought Prompting? [10.398343318429367]
Chain-of-thought (CoT) prompting is a simple and effective method for improving the reasoning capabilities of Large Language Models (LLMs)
Inspired by human cognition, we introduce COT-SEP, a method that strategically employs separators at the end of each exemplar in CoT prompting.
arXiv Detail & Related papers (2024-02-16T12:46:16Z) - Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs [65.2379940117181]
We introduce code prompting, a chain of prompts that transforms a natural language problem into code.
We find that code prompting exhibits a high-performance boost for multiple LLMs.
Our analysis of GPT 3.5 reveals that the code formatting of the input problem is essential for performance improvement.
arXiv Detail & Related papers (2024-01-18T15:32:24Z) - A Prompt Learning Framework for Source Code Summarization [24.33455799484519]
We propose a novel prompt learning framework for code summarization called PromptCS.
PromptCS trains a prompt agent that can generate continuous prompts to unleash the potential for LLMs in code summarization.
We evaluate PromptCS on the CodeSearchNet dataset involving multiple programming languages.
arXiv Detail & Related papers (2023-12-26T14:37:55Z) - kNN-ICL: Compositional Task-Oriented Parsing Generalization with Nearest
Neighbor In-Context Learning [50.40636157214161]
Task-Oriented Parsing (TOP) enables conversational assistants to interpret user commands expressed in natural language.
LLMs have achieved impressive performance in computer programs based on a natural language prompt.
This paper focuses on harnessing the capabilities of LLMs for semantic parsing tasks.
arXiv Detail & Related papers (2023-12-17T17:26:50Z) - Bridging Code Semantic and LLMs: Semantic Chain-of-Thought Prompting for
Code Generation [22.219645213202178]
This paper proposes the "Semantic Chain-of-Thought" approach to intruduce semantic information of code, named SeCoT.
We show that SeCoT can achieves state-of-the-art performance, greatly improving the potential for large models and code generation.
arXiv Detail & Related papers (2023-10-16T05:09:58Z) - CodeT5+: Open Code Large Language Models for Code Understanding and
Generation [72.1638273937025]
Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence.
CodeT5+ is a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks.
We extensively evaluate CodeT5+ on over 20 code-related benchmarks in different settings, including zero-shot, finetuning, and instruction-tuning.
arXiv Detail & Related papers (2023-05-13T14:23:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.