Chain-of-Thought in Neural Code Generation: From and For Lightweight
Language Models
- URL: http://arxiv.org/abs/2312.05562v1
- Date: Sat, 9 Dec 2023 12:20:50 GMT
- Title: Chain-of-Thought in Neural Code Generation: From and For Lightweight
Language Models
- Authors: Guang Yang, Yu Zhou, Xiang Chen, Xiangyu Zhang, Terry Yue Zhuo, Taolue
Chen
- Abstract summary: Large Language Models (LLMs) have demonstrated remarkable potential in code generation.
In this study, we investigate lightweight Language Models (lLMs) which are defined to have fewer than 10 billion parameters.
Based on these findings, we design a novel approach COTTON which can leverage lLMs to automatically generate Chain of Thought (CoTs)
The results show that the CoTs generated by COTTON outperform the baselines in terms of automated and human evaluation metrics.
- Score: 23.727775288971003
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have demonstrated remarkable potential in code
generation. The integration of Chain of Thought (CoT) reasoning can further
boost their performance. However, current CoT methods often require manual
writing or LLMs with over 100 billion parameters to generate, impeding their
applicability in resource-constrained scenarios. In this study, we investigate
lightweight Language Models (lLMs), which are defined to have fewer than 10
billion parameters. Empirically, we find that most lLMs cannot generate
high-quality CoTs when prompted by the few-shot method, but can take advantage
of high-quality CoTs generated elsewhere to improve their performance in code
generation. Based on these findings, we design a novel approach COTTON which
can leverage lLMs to automatically generate CoTs for code generation. We
synthesize new datasets and conduct extensive experiments on various
benchmarks. The results show that the CoTs generated by COTTON outperform the
baselines in terms of automated and human evaluation metrics. In particular,
the CoTs generated by COTTON boost various lLMs to achieve higher performance
gains than those generated by LLMs such as ChatGLM (130B), and are competitive
with those generated by gpt-3.5-turbo (175B). Our study also showcases the
potential of lLMs in software engineering applications.
Related papers
- What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions.
We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types.
We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z) - Applying RLAIF for Code Generation with API-usage in Lightweight LLMs [15.366324461797582]
Reinforcement Learning from AI Feedback (RLAIF) has demonstrated significant potential across various domains.
This paper introduces an RLAIF framework for improving the code generation abilities of lightweight (1B parameters) LLMs.
arXiv Detail & Related papers (2024-06-28T17:16:03Z) - ITERTL: An Iterative Framework for Fine-tuning LLMs for RTL Code Generation [9.409062607311528]
Large language models (LLMs) have demonstrated excellent performance in understanding human instructions and generating code.
We introduce a simple yet effective iterative training paradigm named ITERTL.
We show the model trained through our proposed approach can compete with and even outperform the state-of-the-art (SOTA) open-source model.
arXiv Detail & Related papers (2024-06-28T01:44:57Z) - DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph [70.79413606968814]
We introduce Dynamic Evaluation of LLMs via Adaptive Reasoning Graph Evolvement (DARG) to dynamically extend current benchmarks with controlled complexity and diversity.
Specifically, we first extract the reasoning graphs of data points in current benchmarks and then perturb the reasoning graphs to generate novel testing data.
Such newly generated test samples can have different levels of complexity while maintaining linguistic diversity similar to the original benchmarks.
arXiv Detail & Related papers (2024-06-25T04:27:53Z) - RAG-Enhanced Commit Message Generation [8.858678357308726]
Commit Message Generation has become a research hotspot in automated software engineering.
We propose REACT, a novel REtrieval-Augmented framework for CommiT message generation.
arXiv Detail & Related papers (2024-06-08T16:24:24Z) - UniTSyn: A Large-Scale Dataset Capable of Enhancing the Prowess of Large
Language Models for Program Testing [27.45301385265713]
We present a large-scale dataset UniTSyn, which is capable of enhancing the prowess of LLMs for Unit Test Synthesis.
By leveraging Language Server Protocol, UniSyn achieves the challenging goal of collecting focal-test pairs without per-project execution setups or per-language setups.
Experiments demonstrate that, by building an autoregressive model based on UniTSyn, we can achieve significant benefits in learning and understanding unit test representations.
arXiv Detail & Related papers (2024-02-04T22:48:05Z) - StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z) - Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs)
We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM.
Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z) - Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes [53.4856038354195]
Pre-trained large language models (LLMs) need fine-tuning to improve their responsiveness to natural language instructions.
FedKSeed employs zeroth-order optimization with a finite set of random seeds.
It significantly reduces transmission requirements between the server and clients to just a few random seeds.
arXiv Detail & Related papers (2023-12-11T13:03:21Z) - LLM-Assisted Code Cleaning For Training Accurate Code Generators [53.087019724256606]
We investigate data quality for code and find that making the code more structured and readable leads to improved code generation performance of the system.
We build a novel data-cleaning pipeline that uses these principles to transform existing programs.
We evaluate our approach on two challenging algorithmic code generation benchmarks and find that fine-tuning CodeLLaMa-7B improves the performance by up to 30% compared to fine-tuning on the original dataset.
arXiv Detail & Related papers (2023-11-25T02:45:50Z) - Bridging Code Semantic and LLMs: Semantic Chain-of-Thought Prompting for
Code Generation [22.219645213202178]
This paper proposes the "Semantic Chain-of-Thought" approach to intruduce semantic information of code, named SeCoT.
We show that SeCoT can achieves state-of-the-art performance, greatly improving the potential for large models and code generation.
arXiv Detail & Related papers (2023-10-16T05:09:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.