Related papers: MapCoder: Multi-Agent Code Generation for Competitive Problem Solving

MapCoder: Multi-Agent Code Generation for Competitive Problem Solving

URL: http://arxiv.org/abs/2405.11403v1
Date: Sat, 18 May 2024 22:10:15 GMT
Title: MapCoder: Multi-Agent Code Generation for Competitive Problem Solving
Authors: Md. Ashraful Islam, Mohammed Eunus Ali, Md Rizwan Parvez,
Abstract summary: We introduce a new approach to code generation tasks leveraging multi-agent prompting. Our framework, MapCoder, consists of four LLM agents specifically designed to emulate the stages of program synthesis. Our method consistently delivers superior performance across various programming languages.
Score: 3.3856216159724983
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Code synthesis, which requires a deep understanding of complex natural language problem descriptions, generation of code instructions for complex algorithms and data structures, and the successful execution of comprehensive unit tests, presents a significant challenge. While large language models (LLMs) demonstrate impressive proficiency in natural language processing, their performance in code generation tasks remains limited. In this paper, we introduce a new approach to code generation tasks leveraging multi-agent prompting that uniquely replicates the full cycle of program synthesis as observed in human developers. Our framework, MapCoder, consists of four LLM agents specifically designed to emulate the stages of this cycle: recalling relevant examples, planning, code generation, and debugging. After conducting thorough experiments, with multiple LLM ablations and analyses across eight challenging competitive problem-solving and program synthesis benchmarks, MapCoder showcases remarkable code generation capabilities, achieving new state-of-the-art results (pass@1) on HumanEval (93.9%), MBPP (83.1%), APPS (22.0%), CodeContests (28.5%), and xCodeEval (45.3%). Moreover, our method consistently delivers superior performance across various programming languages and varying problem difficulties. We open-source our framework at https://github.com/Md-Ashraful-Pramanik/MapCoder.

Related papers

DynaCode: A Dynamic Complexity-Aware Code Benchmark for Evaluating Large Language Models in Code Generation [20.75363011870647]
DynaCode is a dynamic, complexity-aware benchmark for large language models (LLMs) It evaluates LLMs systematically using a complexity-aware metric, incorporating both code complexity and call-graph structures. Results on 12 latest LLMs show an average performance drop of 16.8% to 45.7% compared to MBPP+, a static code generation benchmark.
arXiv Detail & Related papers (2025-03-13T15:18:56Z)
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding [49.56049319037421]
KodCode is a synthetic dataset that addresses the persistent challenge of acquiring high-quality, verifiable training data. It comprises question-solution-test triplets that are systematically validated via a self-verification procedure. This pipeline yields a large-scale, robust and diverse coding dataset.
arXiv Detail & Related papers (2025-03-04T19:17:36Z)
UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance [65.01483640267885]
Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks, yet code generation remains a major challenge. We introduce UnitCoder, a systematic pipeline leveraging model-generated unit tests to guide and validate the code generation process. Our work presents a scalable approach that leverages model-generated unit tests to guide the synthesis of high-quality code data from pre-training corpora.
arXiv Detail & Related papers (2025-02-17T05:37:02Z)
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging [3.3856216159724983]
We introduce CodeSim, a novel multi-agent code generation framework. As human verifies their understanding of any algorithms through visual simulation, CodeSim uniquely features a method of plan verification. Our framework achieves new state-of-the-art results-(HumanEval 95.1%, MBPP 90.7%, APPS 22%, and CodeContests 29.1%)
arXiv Detail & Related papers (2025-02-08T18:43:59Z)
Guided Code Generation with LLMs: A Multi-Agent Framework for Complex Code Tasks [1.9198713957364215]
Large Language Models (LLMs) have shown remarkable capabilities in code generation tasks. They face significant limitations in handling complex, long-context programming challenges. This paper introduces a novel agentic framework for guided code generation''
arXiv Detail & Related papers (2025-01-11T19:21:53Z)
CodeXEmbed: A Generalist Embedding Model Family for Multiligual and Multi-task Code Retrieval [103.116634967815]
We introduce CodeXEmbed, a family of large-scale code embedding models ranging from 400M to 7B parameters. Our novel training pipeline unifies multiple programming languages and transforms various code-related tasks into a common retrieval framework. Our 7B model sets a new state-of-the-art (SOTA) in code retrieval, outperforming the previous leading model, Voyage-Code, by over 20% on CoIR benchmark.
arXiv Detail & Related papers (2024-11-19T16:54:45Z)
CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models [106.11371409170818]
Large language models (LLMs) can act as agents with capabilities to self-refine and improve generated code autonomously. We propose CodeTree, a framework for LLM agents to efficiently explore the search space in different stages of the code generation process. Specifically, we adopted a unified tree structure to explicitly explore different coding strategies, generate corresponding coding solutions, and subsequently refine the solutions.
arXiv Detail & Related papers (2024-11-07T00:09:54Z)
Large Language Models as Code Executors: An Exploratory Study [29.545321608864295]
This paper pioneers the exploration of Large Language Models (LLMs) as code executors. We are the first to examine this feasibility across various LLMs, including OpenAI's o1, GPT-4o, GPT-3.5, DeepSeek, and Qwen-Coder. We introduce an Iterative Instruction Prompting (IIP) technique that processes code snippets line by line, enhancing the accuracy of weaker models by an average of 7.22%.
arXiv Detail & Related papers (2024-10-09T08:23:22Z)
What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions. We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types. We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z)
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components. CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks. FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization. Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z)
Exploring Large Language Models for Code Explanation [3.2570216147409514]
Large Language Models (LLMs) have made remarkable strides in Natural Language Processing. This study specifically delves into the task of generating natural-language summaries for code snippets, using various LLMs.
arXiv Detail & Related papers (2023-10-25T14:38:40Z)
PAL: Program-aided Language Models [112.94785609781503]
We present Program-Aided Language models (PaL) to understand natural language problems. PaL offloads the solution step to a programmatic runtime such as a Python interpreter. We set new state-of-the-art results in all 12 benchmarks.
arXiv Detail & Related papers (2022-11-18T18:56:13Z)
MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages [76.93265104421559]
We benchmark code generation from natural language commands extending beyond English. We annotated a total of 896 NL-code pairs in three languages: Spanish, Japanese, and Russian. While the difficulties vary across these three languages, all systems lag significantly behind their English counterparts.
arXiv Detail & Related papers (2022-03-16T04:21:50Z)
Compilable Neural Code Generation with Compiler Feedback [43.97362484564799]
This paper proposes a three-stage pipeline for compilable code generation, including language model fine-tuning, compilability reinforcement, and compilability discrimination. Experiments on two code generation tasks demonstrate the effectiveness of our proposed approach, improving the success rate of compilation from 44.18 to 89.18 on average and from 70.3 to 96.2 in text-to-code generation, respectively.
arXiv Detail & Related papers (2022-03-10T03:15:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.