Related papers: o1-Coder: an o1 Replication for Coding

o1-Coder: an o1 Replication for Coding

URL: http://arxiv.org/abs/2412.00154v2
Date: Tue, 10 Dec 2024 04:39:25 GMT
Title: o1-Coder: an o1 Replication for Coding
Authors: Yuxiang Zhang, Shangxi Wu, Yuqi Yang, Jiangming Shu, Jinlin Xiao, Chao Kong, Jitao Sang,
Abstract summary: O1-CODER is an attempt to replicate OpenAI's o1 model with a focus on coding tasks.<n>It integrates reinforcement learning (RL) and Monte Carlo Tree Search (MCTS) to enhance the model's System-2 thinking capabilities.
Score: 16.01327180847857
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The technical report introduces O1-CODER, an attempt to replicate OpenAI's o1 model with a focus on coding tasks. It integrates reinforcement learning (RL) and Monte Carlo Tree Search (MCTS) to enhance the model's System-2 thinking capabilities. The framework includes training a Test Case Generator (TCG) for standardized code testing, using MCTS to generate code data with reasoning processes, and iteratively fine-tuning the policy model to initially produce pseudocode and then generate the full code. The report also addresses the opportunities and challenges in deploying o1-like models in real-world applications, suggesting transitioning to the System-2 paradigm and highlighting the imperative for world model construction. Updated model progress and experimental results will be reported in subsequent versions. All source code, curated datasets, as well as the derived models are disclosed at https://github.com/ADaM-BJTU/O1-CODER .

Related papers

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models [70.72097493954067]
Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems. While open-access code LLMs are increasingly approaching the performance levels of proprietary models, high-quality code LLMs remain limited. We introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community.
arXiv Detail & Related papers (2024-11-07T17:47:25Z)
In-Context Code-Text Learning for Bimodal Software Engineering [26.0027882745058]
Bimodal software analysis initially appeared to be within reach with the advent of large language models. We postulate that in-context learning for the code-text bimodality is a promising avenue. We consider a diverse dataset encompassing 23 software engineering tasks, which we transform in an in-context learning format.
arXiv Detail & Related papers (2024-10-08T19:42:00Z)
Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search [5.913758275518443]
We consider Code World Models, world models generated by a Large Language Model (LLM) in the form of Python code for model-based Reinforcement Learning (RL) Calling code instead of LLMs for planning has potential to be more precise, reliable, interpretable, and extremely efficient. We show that the Code World Models synthesized with it can be successfully used for planning, resulting in model-based RL agents with greatly improved sample efficiency and inference speed.
arXiv Detail & Related papers (2024-05-24T09:31:26Z)
Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach [66.51005288743153]
We investigate the legal and ethical issues of current neural code completion models. We tailor a membership inference approach (termed CodeMI) that was originally crafted for classification tasks. We evaluate the effectiveness of this adapted approach across a diverse array of neural code completion models.
arXiv Detail & Related papers (2024-04-22T15:54:53Z)
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components. CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks. FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization. Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z)
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules [51.82044734879657]
We propose CodeChain, a novel framework for inference that elicits modularized code generation through a chain of self-revisions. We find that CodeChain can significantly boost both modularity as well as correctness of the generated solutions, achieving relative pass@1 improvements of 35% on APPS and 76% on CodeContests.
arXiv Detail & Related papers (2023-10-13T10:17:48Z)
CodeCoT: Tackling Code Syntax Errors in CoT Reasoning for Code Generation [6.139760107605468]
Chain-of-thought (CoT) has emerged as a groundbreaking tool in NLP, notably for its efficacy in complex reasoning tasks. We present Code Chain-of-Thought (CodeCoT) that integrates CoT with a self-examination process for code generation.
arXiv Detail & Related papers (2023-08-17T04:58:51Z)
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages [116.74407069443895]
We unify encoder and decoder-based models into a single prefix-LM. For learning methods, we explore the claim of a "free lunch" hypothesis. For data distributions, the effect of a mixture distribution and multi-epoch training of programming and natural languages on model performance is explored.
arXiv Detail & Related papers (2023-05-03T17:55:25Z)
Incorporating Domain Knowledge through Task Augmentation for Front-End JavaScript Code Generation [10.75138604869187]
In some domain-specific scenarios, building such a large paired corpus for code generation is difficult because there is no directly available pairing data. We propose a task augmentation method that incorporates domain knowledge into code generation models through auxiliary tasks and a Subtoken-TranX model. Our experimental results demonstrate that the subtoken-level TranX model outperforms the original TranX model and the Transformer model on our dataset.
arXiv Detail & Related papers (2022-08-22T06:57:51Z)
Assemble Foundation Models for Automatic Code Summarization [9.53949558569201]
We propose a flexible and robust approach for automatic code summarization based on neural networks. We assemble available foundation models, such as CodeBERT and GPT-2, into a single model named AdaMo. We introduce two adaptive schemes from the perspective of knowledge transfer, namely continuous pretraining and intermediate finetuning.
arXiv Detail & Related papers (2022-01-13T21:38:33Z)
Energy-Based Models for Code Generation under Compilability Constraints [2.9176992922046923]
In this work, we pose the problem of learning to generate compilable code as constraint satisfaction. We define an Energy-Based Model (EBM) representing a pre-trained generative model with an imposed constraint of generating only compilable sequences. We then use the KL-Adaptive Distributional Policy Gradient algorithm to train a generative model approxing the EBM.
arXiv Detail & Related papers (2021-06-09T11:06:32Z)
CodeBERT: A Pre-Trained Model for Programming and Natural Languages [117.34242908773061]
CodeBERT is a pre-trained model for programming language (PL) and nat-ural language (NL) We develop CodeBERT with Transformer-based neural architecture. We evaluate CodeBERT on two NL-PL applications by fine-tuning model parameters.
arXiv Detail & Related papers (2020-02-19T13:09:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.