Related papers: OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

URL: http://arxiv.org/abs/2402.14658v2
Date: Wed, 28 Feb 2024 03:15:24 GMT
Title: OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Authors: Tianyu Zheng, Ge Zhang, Tianhao Shen, Xueling Liu, Bill Yuchen Lin, Jie Fu, Wenhu Chen, and Xiang Yue
Abstract summary: We introduce OpenCodeInterpreter, a family of open-source code systems for generating, executing, and iteratively refining code. Our comprehensive evaluation of OpenCodeInterpreter across key benchmarks such as HumanEval, MBPP, and their enhanced versions from EvalPlus reveals its exceptional performance.
Score: 58.034012276819425
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: The introduction of large language models has significantly advanced code generation. However, open-source models often lack the execution capabilities and iterative refinement of advanced systems like the GPT-4 Code Interpreter. To address this, we introduce OpenCodeInterpreter, a family of open-source code systems designed for generating, executing, and iteratively refining code. Supported by Code-Feedback, a dataset featuring 68K multi-turn interactions, OpenCodeInterpreter integrates execution and human feedback for dynamic code refinement. Our comprehensive evaluation of OpenCodeInterpreter across key benchmarks such as HumanEval, MBPP, and their enhanced versions from EvalPlus reveals its exceptional performance. Notably, OpenCodeInterpreter-33B achieves an accuracy of 83.2 (76.4) on the average (and plus versions) of HumanEval and MBPP, closely rivaling GPT-4's 84.2 (76.2) and further elevates to 91.6 (84.6) with synthesized human feedback from GPT-4. OpenCodeInterpreter brings the gap between open-source code generation models and proprietary systems like GPT-4 Code Interpreter.

Related papers

CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance [12.001043263281698]
Existing methods fail to steer Large Language Models (LLMs) between textual reasoning and code generation. We introduce CodeSteer, an effective method for guiding LLM code/text generation. Augmenting GPT-4o with CodeSteer raises its average performance score from 53.3 to 86.4.
arXiv Detail & Related papers (2025-02-04T15:53:59Z)
CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models [106.11371409170818]
Large language models (LLMs) can act as agents with capabilities to self-refine and improve generated code autonomously. We propose CodeTree, a framework for LLM agents to efficiently explore the search space in different stages of the code generation process. Specifically, we adopted a unified tree structure to explicitly explore different coding strategies, generate corresponding coding solutions, and subsequently refine the solutions.
arXiv Detail & Related papers (2024-11-07T00:09:54Z)
ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation [39.778073569406175]
We present ReflectionCoder, a novel approach that effectively leverages reflection sequences constructed by integrating compiler feedback to improve one-off code generation performance. Experiments on three benchmarks, i.e., HumanEval (+), MBPP (+), and MultiPl-E, demonstrate that models fine-tuned with our method achieve state-of-the-art performance.
arXiv Detail & Related papers (2024-05-27T11:27:00Z)
Prompt-based Code Completion via Multi-Retrieval Augmented Generation [15.233727939816388]
ProCC is a code completion framework leveraging prompt engineering and the contextual multi-armed bandits algorithm. ProCC outperforms state-of-the-art code completion technique by 8.6% on our collected open-source benchmark suite. ProCC also allows augmenting fine-tuned techniques in a plug-and-play manner, yielding 5.6% improvement over our studied fine-tuned model.
arXiv Detail & Related papers (2024-05-13T07:56:15Z)
Magicoder: Empowering Code Generation with OSS-Instruct [14.414411313794911]
We introduce Magicoder, a series of fully open-source (code, weights, and data) Large Language Models (LLMs) for code. Magicoder models are trained on 75K synthetic instruction data using OSS-Instruct. Both Magicoder and MagicoderS substantially outperform state-of-the-art code models with similar or even larger sizes on a wide range of coding benchmarks.
arXiv Detail & Related papers (2023-12-04T18:50:35Z)
CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model [58.127534002232096]
This paper introduces CodeFuse-13B, an open-sourced pre-trained code LLM. It is specifically designed for code-related tasks with both English and Chinese prompts. CodeFuse achieves its effectiveness by utilizing a high quality pre-training dataset.
arXiv Detail & Related papers (2023-10-10T02:38:44Z)
AI-assisted Code Authoring at Scale: Fine-tuning, deploying, and mixed methods evaluation [9.915327592560896]
We present CodeCompose, an AI-assisted code authoring tool developed and deployed at Meta internally. CodeCompose is based on the InCoder LLM that merges generative capabilities with bi-directionality. In a random sample of 20K source code files, we are able to reproduce hidden lines between 40% and 58% of the time, an improvement of 1.4x and 4.1x over a model trained only on public data.
arXiv Detail & Related papers (2023-05-20T00:45:15Z)
StarCoder: may the source be with you! [79.93915935620798]
The BigCode community introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories.
arXiv Detail & Related papers (2023-05-09T08:16:42Z)
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X [50.008474888951525]
We introduce CodeGeeX, a multilingual model with 13 billion parameters for code generation. CodeGeeX is pre-trained on 850 billion tokens of 23 programming languages.
arXiv Detail & Related papers (2023-03-30T17:34:01Z)
Coder Reviewer Reranking for Code Generation [56.80381384717]
We propose Coder-Reviewer reranking as a method for sampling diverse programs from a code language model and reranking with model likelihood. Experimental results show that Coder-Reviewer reranking leads to consistent and significant improvement over reranking with the Coder model only. Coder-Reviewer reranking is easy to implement by prompting, can generalize to different programming languages, and works well with off-the-shelf hyper parameters.
arXiv Detail & Related papers (2022-11-29T18:56:33Z)
Compilable Neural Code Generation with Compiler Feedback [43.97362484564799]
This paper proposes a three-stage pipeline for compilable code generation, including language model fine-tuning, compilability reinforcement, and compilability discrimination. Experiments on two code generation tasks demonstrate the effectiveness of our proposed approach, improving the success rate of compilation from 44.18 to 89.18 on average and from 70.3 to 96.2 in text-to-code generation, respectively.
arXiv Detail & Related papers (2022-03-10T03:15:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.