CodeCoR: An LLM-Based Self-Reflective Multi-Agent Framework for Code Generation
- URL: http://arxiv.org/abs/2501.07811v1
- Date: Tue, 14 Jan 2025 03:21:10 GMT
- Title: CodeCoR: An LLM-Based Self-Reflective Multi-Agent Framework for Code Generation
- Authors: Ruwei Pan, Hongyu Zhang, Chao Liu,
- Abstract summary: Code generation aims to produce code that fulfills requirements written in natural languages automatically.
Large language Models (LLMs) like ChatGPT fail to ensure the syntactic and semantic correctness of the generated code.
We propose CodeCoR, a self-reflective multi-agent framework that evaluates the effectiveness of each agent and their collaborations.
- Score: 10.048098631259876
- License:
- Abstract: Code generation aims to produce code that fulfills requirements written in natural languages automatically. Large language Models (LLMs) like ChatGPT have demonstrated promising effectiveness in this area. Nonetheless, these LLMs often fail to ensure the syntactic and semantic correctness of the generated code. Recently, researchers proposed multi-agent frameworks that guide LLMs with different prompts to analyze programming tasks, generate code, perform testing in a sequential workflow. However, the performance of the workflow is not robust as the code generation depends on the performance of each agent. To address this challenge, we propose CodeCoR, a self-reflective multi-agent framework that evaluates the effectiveness of each agent and their collaborations. Specifically, for a given task description, four agents in CodeCoR generate prompts, code, test cases, and repair advice, respectively. Each agent generates more than one output and prunes away the low-quality ones. The generated code is tested in the local environment: the code that fails to pass the generated test cases is sent to the repair agent and the coding agent re-generates the code based on repair advice. Finally, the code that passes the most number of generated test cases is returned to users. Our experiments on four widely used datasets, HumanEval, HumanEval-ET, MBPP, and MBPP-ET, demonstrate that CodeCoR significantly outperforms existing baselines (e.g., CodeCoT and MapCoder), achieving an average Pass@1 score of 77.8%.
Related papers
- CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models [106.11371409170818]
Large language models (LLMs) can act as agents with capabilities to self-refine and improve generated code autonomously.
We propose CodeTree, a framework for LLM agents to efficiently explore the search space in different stages of the code generation process.
Specifically, we adopted a unified tree structure to explicitly explore different coding strategies, generate corresponding coding solutions, and subsequently refine the solutions.
arXiv Detail & Related papers (2024-11-07T00:09:54Z) - Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework.
Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z) - SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents [10.730852617039451]
We investigate the capability of LLM-based Code Agents to formalize user issues into test cases.
We propose a novel benchmark based on popular GitHub repositories, containing real-world issues, ground-truth bug-fixes, and golden tests.
We find that LLMs generally perform surprisingly well at generating relevant test cases, with Code Agents designed for code repair exceeding the performance of systems designed for test generation.
arXiv Detail & Related papers (2024-06-18T14:54:37Z) - CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code [56.019447113206006]
Large Language Models (LLMs) have achieved remarkable progress in code generation.
CodeIP is a novel multi-bit watermarking technique that inserts additional information to preserve provenance details.
Experiments conducted on a real-world dataset across five programming languages demonstrate the effectiveness of CodeIP.
arXiv Detail & Related papers (2024-04-24T04:25:04Z) - Test-Driven Development for Code Generation [0.850206009406913]
Large Language Models (LLMs) have demonstrated significant capabilities in generating code snippets directly from problem statements.
This paper investigates if and how Test-Driven Development (TDD) can be incorporated into AI-assisted code-generation processes.
arXiv Detail & Related papers (2024-02-21T04:10:12Z) - StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z) - Using LLM such as ChatGPT for Designing and Implementing a RISC
Processor: Execution,Challenges and Limitations [11.07566083431614]
The paper reviews the associated steps such as parsing, tokenization, encoding, attention mechanism, sampling the tokens and iterations during code generation.
The generated code for the RISC components is verified through testbenches and hardware implementation on a FPGA board.
arXiv Detail & Related papers (2024-01-18T20:14:10Z) - CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges [41.038584732889895]
Large Language Models (LLMs) have shown promise in automated code generation but typically excel only in simpler tasks.
Our research pivots towards evaluating LLMs in a more realistic setting -- real-world repo-level code generation.
We present CodeAgent, a novel LLM-based agent framework that employs external tools for effective repo-level code generation.
arXiv Detail & Related papers (2024-01-14T18:12:03Z) - AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation [11.155351560550853]
This paper introduces Multi-Agent Assistant Code Generation (AgentCoder)
AgentCoder is a novel solution comprising a multi-agent framework with specialized agents: the programmer agent, the test designer agent, and the test executor agent.
Our experiments on 9 code generation models and 12 enhancement approaches showcase AgentCoder's superior performance over existing code generation models.
arXiv Detail & Related papers (2023-12-20T13:22:41Z) - CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model [58.127534002232096]
This paper introduces CodeFuse-13B, an open-sourced pre-trained code LLM.
It is specifically designed for code-related tasks with both English and Chinese prompts.
CodeFuse achieves its effectiveness by utilizing a high quality pre-training dataset.
arXiv Detail & Related papers (2023-10-10T02:38:44Z) - CodeT5+: Open Code Large Language Models for Code Understanding and
Generation [72.1638273937025]
Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence.
CodeT5+ is a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks.
We extensively evaluate CodeT5+ on over 20 code-related benchmarks in different settings, including zero-shot, finetuning, and instruction-tuning.
arXiv Detail & Related papers (2023-05-13T14:23:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.