CodePori: Large Scale Model for Autonomous Software Development by Using
Multi-Agents
- URL: http://arxiv.org/abs/2402.01411v1
- Date: Fri, 2 Feb 2024 13:42:50 GMT
- Title: CodePori: Large Scale Model for Autonomous Software Development by Using
Multi-Agents
- Authors: Zeeshan Rasheed, Muhammad Waseem, Mika Saari, Kari Syst\"a, Pekka
Abrahamsson
- Abstract summary: Large Language Models (LLMs) and Generative Pre-trained Transformers (GPTs) are reshaping the field of Software Engineering (SE)
This paper introduces CodePori, a novel model designed to automate code generation for extensive and complex software projects based on natural language prompts.
We show in the paper that CodePori is able to generate running code for large-scale projects, completing the entire software development process in minutes rather than hours, and at a cost of a few dollars.
- Score: 3.8066447473175304
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) and Generative Pre-trained Transformers (GPTs)
are reshaping the field of Software Engineering (SE). Existing LLM-based
multi-agent systems have successfully resolved simple dialogue tasks. However,
the potential of LLMs for more complex tasks, such as automated code generation
for large and complex projects, have been explored in only a few existing
works. This paper introduces CodePori, a novel model designed to automate code
generation for extensive and complex software projects based on natural
language prompts. We employ LLM-based multi-AI agents to handle creative and
challenging tasks in autonomous software development. Each agent engages with a
specific task, including system design, code development, code review, code
verification, and test engineering. We show in the paper that CodePori is able
to generate running code for large-scale projects, completing the entire
software development process in minutes rather than hours, and at a cost of a
few dollars. It identifies and mitigates potential security vulnerabilities and
corrects errors while maintaining a solid code performance level. We also
conducted an evaluation of CodePori against existing solutions using HumanEval
and the Massively Multitask Benchmark for Python (MBPP) benchmark. The results
indicate that CodePori improves upon the benchmarks in terms of code accuracy,
efficiency, and overall performance. For example, CodePori improves the pass@1
metric on HumanEval to 87.5% and on MBPP to 86.5%, representing a clear
improvement over the existing models. We also assessed CodePori's performance
through practitioner evaluations, with 91% expressing satisfaction with the
model's performance.
Related papers
- BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions [72.56339136017759]
We introduce Bench, a benchmark that challenges Large Language Models to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained programming tasks.
Our evaluation shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%.
arXiv Detail & Related papers (2024-06-22T15:52:04Z) - Validating LLM-Generated Programs with Metamorphic Prompt Testing [8.785973653167112]
Large Language Models (LLMs) are increasingly integrated into the software development lifecycle.
This paper proposes a novel solution called metamorphic prompt testing to address these challenges.
Our evaluation on HumanEval shows that metamorphic prompt testing is able to detect 75 percent of the erroneous programs generated by GPT-4, with a false positive rate of 8.6 percent.
arXiv Detail & Related papers (2024-06-11T00:40:17Z) - MapCoder: Multi-Agent Code Generation for Competitive Problem Solving [3.3856216159724983]
We introduce a new approach to code generation tasks leveraging multi-agent prompting.
Our framework, MapCoder, consists of four LLM agents specifically designed to emulate the stages of program synthesis.
Our method consistently delivers superior performance across various programming languages.
arXiv Detail & Related papers (2024-05-18T22:10:15Z) - Granite Code Models: A Family of Open Foundation Models for Code Intelligence [37.946802472358996]
Large Language Models (LLMs) trained on code are revolutionizing the software development process.
LLMs are being integrated into software development environments to improve the productivity of human programmers.
We introduce the Granite series of decoder-only code models for code generative tasks.
arXiv Detail & Related papers (2024-05-07T13:50:40Z) - Performance-Aligned LLMs for Generating Fast Code [2.180216161965907]
We introduce a reinforcement learning based methodology to align the outputs of code LLMs with performance.
We demonstrate that our fine-tuned model improves the expected speedup of generated code over base models for a set of benchmark tasks.
arXiv Detail & Related papers (2024-04-29T16:52:38Z) - When LLM-based Code Generation Meets the Software Development Process [50.82665351100067]
This paper introduces LCG, a code generation framework inspired by established software engineering practices.
LLM agents emulate various software process models, namely LCGWaterfall, LCGTDD, and LCGScrum.
We evaluate LCG across four code generation benchmarks: HumanEval, HumanEval-ET, MBPP, and MBPP-ET.
arXiv Detail & Related papers (2024-03-23T14:04:48Z) - InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models [56.723509505549536]
InfiBench is the first large-scale freeform question-answering (QA) benchmark for code to our knowledge.
It comprises 234 carefully selected high-quality Stack Overflow questions that span across 15 programming languages.
We conduct a systematic evaluation for over 100 latest code LLMs on InfiBench, leading to a series of novel and insightful findings.
arXiv Detail & Related papers (2024-03-11T02:06:30Z) - StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z) - AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation [11.155351560550853]
This paper introduces Multi-Agent Assistant Code Generation (AgentCoder)
AgentCoder is a novel solution comprising a multi-agent framework with specialized agents: the programmer agent, the test designer agent, and the test executor agent.
Our experiments on 9 code generation models and 12 enhancement approaches showcase AgentCoder's superior performance over existing code generation models.
arXiv Detail & Related papers (2023-12-20T13:22:41Z) - CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model [58.127534002232096]
This paper introduces CodeFuse-13B, an open-sourced pre-trained code LLM.
It is specifically designed for code-related tasks with both English and Chinese prompts.
CodeFuse achieves its effectiveness by utilizing a high quality pre-training dataset.
arXiv Detail & Related papers (2023-10-10T02:38:44Z) - Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation.
Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges.
Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.