Related papers: VisionCoder: Empowering Multi-Agent Auto-Programming for Image Processing with Hybrid LLMs

VisionCoder: Empowering Multi-Agent Auto-Programming for Image Processing with Hybrid LLMs

URL: http://arxiv.org/abs/2410.19245v1
Date: Fri, 25 Oct 2024 01:52:15 GMT
Title: VisionCoder: Empowering Multi-Agent Auto-Programming for Image Processing with Hybrid LLMs
Authors: Zixiao Zhao, Jing Sun, Zhiyuan Wei, Cheng-Hao Cai, Zhe Hou, Jin Song Dong,
Abstract summary: This paper presents a multi-agent framework that collaboratively completes auto-programming tasks. Each agent plays a distinct role in the software development cycle, collectively forming a virtual organisation. By establishing a tree-structured thought distribution and development mechanism across project, module, and function levels, this framework offers a cost-effective and efficient solution.
Score: 8.380216582290025
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the field of automated programming, large language models (LLMs) have demonstrated foundational generative capabilities when given detailed task descriptions. However, their current functionalities are primarily limited to function-level development, restricting their effectiveness in complex project environments and specific application scenarios, such as complicated image-processing tasks. This paper presents a multi-agent framework that utilises a hybrid set of LLMs, including GPT-4o and locally deployed open-source models, which collaboratively complete auto-programming tasks. Each agent plays a distinct role in the software development cycle, collectively forming a virtual organisation that works together to produce software products. By establishing a tree-structured thought distribution and development mechanism across project, module, and function levels, this framework offers a cost-effective and efficient solution for code generation. We evaluated our approach using benchmark datasets, and the experimental results demonstrate that VisionCoder significantly outperforms existing methods in image processing auto-programming tasks.

Related papers

Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models. Controlled Decoding provides a mechanism for aligning a model at inference time without retraining. We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z)
Enhancing Multi-Agent Systems via Reinforcement Learning with LLM-based Planner and Graph-based Policy [31.041340552853004]
Graph Collaboration MARL (LGC-MARL) is a framework that efficiently combines Large Language Models (LLMs) and Multi-Agent Reinforcement Learning (MARL) LGC-MARL decomposes complex tasks into executable subtasks and achieves efficient collaboration among multiple agents through graph-based coordination. Experimental results on the AI2-THOR simulation platform demonstrate the superior performance and scalability of LGC-MARL.
arXiv Detail & Related papers (2025-03-13T05:02:49Z)
AgentPS: Agentic Process Supervision for Multi-modal Content Quality Assurance through Multi-round QA [9.450927573476822]
textitAgentPS is a novel framework that integrates Agentic Process Supervision into MLLMs via multi-round question answering during fine-tuning. textitAgentPS demonstrates significant performance improvements over baseline MLLMs on proprietary TikTok datasets.
arXiv Detail & Related papers (2024-12-15T04:58:00Z)
MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation [52.739500459903724]
Large Language Models (LLMs) have demonstrated remarkable planning abilities across various domains, including robotics manipulation and navigation. We propose a novel multi-agent LLM framework that distributes high-level planning and low-level control code generation across specialized LLM agents. We evaluate our approach on nine RLBench tasks, including long-horizon tasks, and demonstrate its ability to solve robotics manipulation in a zero-shot setting.
arXiv Detail & Related papers (2024-11-26T17:53:44Z)
A Layered Architecture for Developing and Enhancing Capabilities in Large Language Model-based Software Systems [18.615283725693494]
This paper introduces a layered architecture that organizes Large Language Models (LLMs) software system development into distinct layers. By aligning capabilities with these layers, the framework encourages the systematic implementation of capabilities in effective and efficient ways.
arXiv Detail & Related papers (2024-11-19T09:18:20Z)
Benchmarking Agentic Workflow Generation [80.74757493266057]
We introduce WorfBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures. We also present WorfEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms. We observe that the generated can enhance downstream tasks, enabling them to achieve superior performance with less time during inference.
arXiv Detail & Related papers (2024-10-10T12:41:19Z)
CoBa: Convergence Balancer for Multitask Finetuning of Large Language Models [23.50705152648991]
Multi-task learning (MTL) benefits the fine-tuning of large language models (LLMs) Existing MTL strategies for LLMs often fall short by either being computationally intensive or failing to ensure simultaneous task convergence. This paper presents CoBa, a new MTL approach designed to effectively manage task convergence balance with minimal computational overhead.
arXiv Detail & Related papers (2024-10-09T10:20:32Z)
AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML [56.565200973244146]
Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline. Recent works have started exploiting large language models (LLM) to lessen such burden. This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML.
arXiv Detail & Related papers (2024-10-03T20:01:09Z)
GenAgent: Build Collaborative AI Systems with Automated Workflow Generation -- Case Studies on ComfyUI [64.57616646552869]
This paper explores collaborative AI systems that use to enhance performance to integrate models, data sources, and pipelines to solve complex and diverse tasks. We introduce GenAgent, an LLM-based framework that automatically generates complex, offering greater flexibility and scalability compared to monolithic models. The results demonstrate that GenAgent outperforms baseline approaches in both run-level and task-level evaluations.
arXiv Detail & Related papers (2024-09-02T17:44:10Z)
Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration [70.09561665520043]
We propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans. We provide theoretical analysis by extending advantage-weighted regression in reinforcement learning to multi-agent systems. Experiments on Over-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate, and also significantly decreases the interaction steps of agents.
arXiv Detail & Related papers (2024-05-23T08:33:19Z)
Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning [14.635361844362794]
Smurfs' is a cutting-edge multi-agent framework designed to revolutionize the application of large language models. Smurfs can enhance the model's ability to solve complex tasks at no additional cost.
arXiv Detail & Related papers (2024-05-09T17:49:04Z)
Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning [56.82041895921434]
Open-source pre-trained Large Language Models (LLMs) exhibit strong language understanding and generation capabilities. When used as agents for dealing with complex problems in the real world, their performance is far inferior to large commercial models such as ChatGPT and GPT-4.
arXiv Detail & Related papers (2024-03-29T03:48:12Z)
Towards Single-System Illusion in Software-Defined Vehicles -- Automated, AI-Powered Workflow [3.2821049498759094]
We propose a novel model- and feature-based approach to development of vehicle software systems. One of the key points of the presented approach is the inclusion of modern generative AI, specifically Large Language Models (LLMs) The resulting pipeline is automated to a large extent, with feedback being generated at each step.
arXiv Detail & Related papers (2024-03-21T15:07:57Z)
Characterization of Large Language Model Development in the Datacenter [55.9909258342639]
Large Language Models (LLMs) have presented impressive performance across several transformative tasks. However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs. We present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme.
arXiv Detail & Related papers (2024-03-12T13:31:14Z)
TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation. Specifically, task decomposition, tool selection, and parameter prediction are assessed. Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z)
De-fine: Decomposing and Refining Visual Programs with Auto-Feedback [75.62712247421146]
De-fine is a training-free framework that decomposes complex tasks into simpler subtasks and refines programs through auto-feedback. Our experiments across various visual tasks show that De-fine creates more robust programs.
arXiv Detail & Related papers (2023-11-21T06:24:09Z)
CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets [75.64181719386497]
We present CRAFT, a tool creation and retrieval framework for large language models (LLMs) It creates toolsets specifically curated for the tasks and equips LLMs with a component that retrieves tools from these sets to enhance their capability to solve complex tasks. Our method is designed to be flexible and offers a plug-and-play approach to adapt off-the-shelf LLMs to unseen domains and modalities, without any finetuning.
arXiv Detail & Related papers (2023-09-29T17:40:26Z)
Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents [0.0]
We present a novel framework for enhancing the capabilities of large language models (LLMs) by leveraging the power of multi-agent systems. Our framework introduces a collaborative environment where multiple intelligent agent components, each with distinctive attributes and roles, work together to handle complex tasks more efficiently and effectively.
arXiv Detail & Related papers (2023-06-05T23:55:37Z)
Self-collaboration Code Generation via ChatGPT [35.88318116340547]
Large Language Models (LLMs) have demonstrated remarkable code-generation ability, but struggle with complex tasks. We present a self-collaboration framework for code generation employing LLMs, exemplified by ChatGPT. To effectively organize and manage this virtual team, we incorporate software-development methodology into the framework.
arXiv Detail & Related papers (2023-04-15T16:33:32Z)
Learning Multi-Objective Curricula for Deep Reinforcement Learning [55.27879754113767]
Various automatic curriculum learning (ACL) methods have been proposed to improve the sample efficiency and final performance of deep reinforcement learning (DRL) In this paper, we propose a unified automatic curriculum learning framework to create multi-objective but coherent curricula. In addition to existing hand-designed curricula paradigms, we further design a flexible memory mechanism to learn an abstract curriculum.
arXiv Detail & Related papers (2021-10-06T19:30:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.