Related papers: Self-collaboration Code Generation via ChatGPT

Self-collaboration Code Generation via ChatGPT

URL: http://arxiv.org/abs/2304.07590v3
Date: Sat, 11 May 2024 14:00:45 GMT
Title: Self-collaboration Code Generation via ChatGPT
Authors: Yihong Dong, Xue Jiang, Zhi Jin, Ge Li,
Abstract summary: Large Language Models (LLMs) have demonstrated remarkable code-generation ability, but struggle with complex tasks. We present a self-collaboration framework for code generation employing LLMs, exemplified by ChatGPT. To effectively organize and manage this virtual team, we incorporate software-development methodology into the framework.
Score: 35.88318116340547
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although Large Language Models (LLMs) have demonstrated remarkable code-generation ability, they still struggle with complex tasks. In real-world software development, humans usually tackle complex tasks through collaborative teamwork, a strategy that significantly controls development complexity and enhances software quality. Inspired by this, we present a self-collaboration framework for code generation employing LLMs, exemplified by ChatGPT. Specifically, through role instructions, 1) Multiple LLM agents act as distinct `experts', each responsible for a specific subtask within a complex task; 2) Specify the way to collaborate and interact, so that different roles form a virtual team to facilitate each other's work, ultimately the virtual team addresses code generation tasks collaboratively without the need for human intervention. To effectively organize and manage this virtual team, we incorporate software-development methodology into the framework. Thus, we assemble an elementary team consisting of three LLM roles (i.e., analyst, coder, and tester) responsible for software development's analysis, coding, and testing stages. We conduct comprehensive experiments on various code-generation benchmarks. Experimental results indicate that self-collaboration code generation relatively improves 29.9%-47.1% Pass@1 compared to the base LLM agent. Moreover, we showcase that self-collaboration could potentially enable LLMs to efficiently handle complex repository-level tasks that are not readily solved by the single LLM agent.

Related papers

Collaborating Action by Action: A Multi-agent LLM Framework for Embodied Reasoning [12.923902619187274]
This work studies how LLMs can adaptively collaborate to perform complex embodied reasoning tasks. MINDcraft is a platform built to enable LLM agents to control characters in the open-world game of Minecraft. An experimental study finds that the primary bottleneck in collaborating effectively for current state-of-the-art agents is efficient natural language communication.
arXiv Detail & Related papers (2025-04-24T21:28:16Z)
Enhancing Multi-Agent Systems via Reinforcement Learning with LLM-based Planner and Graph-based Policy [31.041340552853004]
Graph Collaboration MARL (LGC-MARL) is a framework that efficiently combines Large Language Models (LLMs) and Multi-Agent Reinforcement Learning (MARL) LGC-MARL decomposes complex tasks into executable subtasks and achieves efficient collaboration among multiple agents through graph-based coordination. Experimental results on the AI2-THOR simulation platform demonstrate the superior performance and scalability of LGC-MARL.
arXiv Detail & Related papers (2025-03-13T05:02:49Z)
When One LLM Drools, Multi-LLM Collaboration Rules [98.71562711695991]
We argue for multi-LLM collaboration to better represent the extensive diversity of data, skills, and people. We organize existing multi-LLM collaboration methods into a hierarchy, based on the level of access and information exchange. We envision multi-LLM collaboration as an essential path toward compositional intelligence and collaborative AI development.
arXiv Detail & Related papers (2025-02-06T21:13:44Z)
VisionCoder: Empowering Multi-Agent Auto-Programming for Image Processing with Hybrid LLMs [8.380216582290025]
This paper presents a multi-agent framework that collaboratively completes auto-programming tasks. Each agent plays a distinct role in the software development cycle, collectively forming a virtual organisation. By establishing a tree-structured thought distribution and development mechanism across project, module, and function levels, this framework offers a cost-effective and efficient solution.
arXiv Detail & Related papers (2024-10-25T01:52:15Z)
ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems [80.69865295743149]
This work attempts to study using LLM-based agents to design collaborative AI systems autonomously. Based on ComfyBench, we develop ComfyAgent, a framework that empowers agents to autonomously design collaborative AI systems by generating. While ComfyAgent achieves a comparable resolve rate to o1-preview and significantly surpasses other agents on ComfyBench, ComfyAgent has resolved only 15% of creative tasks.
arXiv Detail & Related papers (2024-09-02T17:44:10Z)
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions [72.56339136017759]
We introduce BigCodeBench, a benchmark that challenges Large Language Models (LLMs) to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained tasks. Our evaluation shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%. We propose a natural-language-oriented variant of BigCodeBench, BigCodeBench-Instruct, that automatically transforms the original docstrings into short instructions only with essential information.
arXiv Detail & Related papers (2024-06-22T15:52:04Z)
Multi-Agent Software Development through Cross-Team Collaboration [30.88149502999973]
We introduce Cross-Team Collaboration (CTC), a scalable multi-team framework for software development. CTC enables orchestrated teams to jointly propose various decisions and communicate with their insights. Results show a notable increase in quality compared to state-of-the-art baselines.
arXiv Detail & Related papers (2024-06-13T10:18:36Z)
Your Co-Workers Matter: Evaluating Collaborative Capabilities of Language Models in Blocks World [13.005764902339523]
We design a blocks-world environment where two agents, each having unique goals and skills, build a target structure together. To complete the goals, they can act in the world and communicate in natural language. We adopt chain-of-thought prompts that include intermediate reasoning steps to model the partner's state and identify and correct execution errors.
arXiv Detail & Related papers (2024-03-30T04:48:38Z)
Automatic Robotic Development through Collaborative Framework by Large Language Models [13.957351735394683]
We propose an innovative automated collaboration framework inspired by real-world robot developers. This framework employs multiple LLMs in distinct roles analysts, programmers, and testers. Analysts delve deep into user requirements, enabling programmers to produce precise code, while testers fine-tune the parameters.
arXiv Detail & Related papers (2024-02-06T04:40:27Z)
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code) Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z)
Experiential Co-Learning of Software-Developing Agents [83.34027623428096]
Large language models (LLMs) have brought significant changes to various domains, especially in software development. We introduce Experiential Co-Learning, a novel LLM-agent learning framework. Experiments demonstrate that the framework enables agents to tackle unseen software-developing tasks more effectively.
arXiv Detail & Related papers (2023-12-28T13:50:42Z)
TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation. Specifically, task decomposition, tool selection, and parameter prediction are assessed. Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z)
Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration [83.4031923134958]
Corex is a suite of novel general-purpose strategies that transform Large Language Models into autonomous agents. Inspired by human behaviors, Corex is constituted by diverse collaboration paradigms including Debate, Review, and Retrieve modes. We demonstrate that orchestrating multiple LLMs to work in concert yields substantially better performance compared to existing methods.
arXiv Detail & Related papers (2023-09-30T07:11:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.