CodeLutra: Boosting LLM Code Generation via Preference-Guided Refinement
- URL: http://arxiv.org/abs/2411.05199v1
- Date: Thu, 07 Nov 2024 21:51:07 GMT
- Title: CodeLutra: Boosting LLM Code Generation via Preference-Guided Refinement
- Authors: Leitian Tao, Xiang Chen, Tong Yu, Tung Mai, Ryan Rossi, Yixuan Li, Saayan Mitra,
- Abstract summary: We introduce CodeLutra, a novel framework that enhances low-performing large language models.
Unlike conventional fine-tuning, CodeLutra employs an iterative preference learning mechanism to compare correct and incorrect solutions.
On a challenging data analysis task, using just 500 samples improved Llama-3-8B's accuracy from 28.2% to 48.6%, approaching GPT-4's performance.
- Score: 32.46078765471136
- License:
- Abstract: Large Language Models (LLMs) have significantly advanced code generation but often require substantial resources and tend to over-generalize, limiting their efficiency for specific tasks. Fine-tuning smaller, open-source LLMs presents a viable alternative; however, it typically lags behind cutting-edge models due to supervised fine-tuning's reliance solely on correct code examples, which restricts the model's ability to learn from its own mistakes and adapt to diverse programming challenges. To bridge this gap, we introduce CodeLutra, a novel framework that enhances low-performing LLMs by leveraging both successful and failed code generation attempts. Unlike conventional fine-tuning, CodeLutra employs an iterative preference learning mechanism to compare correct and incorrect solutions as well as maximize the likelihood of correct codes. Through continuous iterative refinement, CodeLutra enables smaller LLMs to match or surpass GPT-4's performance in various code generation tasks without relying on vast external datasets or larger auxiliary models. On a challenging data analysis task, using just 500 samples improved Llama-3-8B's accuracy from 28.2% to 48.6%, approaching GPT-4's performance. These results highlight CodeLutra's potential to close the gap between open-source and closed-source models, making it a promising approach in the field of code generation.
Related papers
- Precision or Peril: Evaluating Code Quality from Quantized Large Language Models [0.5249805590164902]
Quantization has emerged as a way to mitigate the memory overhead of Large Language Models.
This study aims to evaluate the current code generation capabilities of smaller LLMs using various metrics.
arXiv Detail & Related papers (2024-11-16T01:31:29Z) - OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models [70.72097493954067]
Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems.
While open-access code LLMs are increasingly approaching the performance levels of proprietary models, high-quality code LLMs remain limited.
We introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community.
arXiv Detail & Related papers (2024-11-07T17:47:25Z) - Enhancing Discriminative Tasks by Guiding the Pre-trained Language Model with Large Language Model's Experience [4.814313782484443]
Large Language Models (LLMs) and pre-trained Language Models (LMs) have achieved impressive success on many software engineering tasks.
We use LLMs to generate domain-specific data, thereby improving the performance of pre-trained LMs on the target tasks.
arXiv Detail & Related papers (2024-08-16T06:37:59Z) - OriGen:Enhancing RTL Code Generation with Code-to-Code Augmentation and Self-Reflection [54.775409528658486]
OriGen is a fully open-source framework that incorporates self-reflection capabilities and a novel dataset augmentation methodology.
Our approach employs a code-tocode augmentation technique to enhance the quality of open-source RTL code datasets.
arXiv Detail & Related papers (2024-07-23T07:22:25Z) - What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions.
We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types.
We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z) - Applying RLAIF for Code Generation with API-usage in Lightweight LLMs [15.366324461797582]
Reinforcement Learning from AI Feedback (RLAIF) has demonstrated significant potential across various domains.
This paper introduces an RLAIF framework for improving the code generation abilities of lightweight (1B parameters) LLMs.
arXiv Detail & Related papers (2024-06-28T17:16:03Z) - AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data [64.69872638349922]
We present AlchemistCoder, a series of Code LLMs with enhanced code generation and generalization capabilities fine-tuned on multi-source data.
We propose incorporating the data construction process into the fine-tuning data as code comprehension tasks, including instruction evolution, data filtering, and code review.
arXiv Detail & Related papers (2024-05-29T16:57:33Z) - Enhancing Code Generation Performance of Smaller Models by Distilling the Reasoning Ability of LLMs [36.409470894115074]
We propose the CodePLAN framework, which aims to transfer LLMs' code generation reasoning capabilities to smaller models.
Our approach improves the smaller model's code generation performance by over 130% on the challenging APPS benchmark.
arXiv Detail & Related papers (2024-03-20T03:09:54Z) - SEED: Customize Large Language Models with Sample-Efficient Adaptation for Code Generation [35.88318116340547]
We propose a novel adaptation approach named SEED, which stands for Sample-Efficient adaptation with Error-Driven learning for code generation.
We show that SEED achieves superior performance with few training samples, showing an average relative improvement of 54.7% in Pass@1 on multiple code generation benchmarks.
arXiv Detail & Related papers (2024-02-29T16:09:02Z) - StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z) - LLM-Assisted Code Cleaning For Training Accurate Code Generators [53.087019724256606]
We investigate data quality for code and find that making the code more structured and readable leads to improved code generation performance of the system.
We build a novel data-cleaning pipeline that uses these principles to transform existing programs.
We evaluate our approach on two challenging algorithmic code generation benchmarks and find that fine-tuning CodeLLaMa-7B improves the performance by up to 30% compared to fine-tuning on the original dataset.
arXiv Detail & Related papers (2023-11-25T02:45:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.