MARCO: Multi-Agent Real-time Chat Orchestration
- URL: http://arxiv.org/abs/2410.21784v1
- Date: Tue, 29 Oct 2024 06:42:27 GMT
- Title: MARCO: Multi-Agent Real-time Chat Orchestration
- Authors: Anubhav Shrimal, Stanley Kanagaraj, Kriti Biswas, Swarnalatha Raghuraman, Anish Nediyanchath, Yi Zhang, Promod Yenigalla,
- Abstract summary: We present MARCO, a Multi-Agent Real-time Chat Orchestration framework for automating tasks using LLMs.
MARCO addresses key challenges in utilizing LLMs for complex, multi-step task execution.
We show MARCO's superior performance with 94.48% and 92.74% accuracy on task execution for Digital Restaurant Service Platform conversations and Retail conversations datasets respectively.
- Score: 6.7741570640544415
- License:
- Abstract: Large language model advancements have enabled the development of multi-agent frameworks to tackle complex, real-world problems such as to automate tasks that require interactions with diverse tools, reasoning, and human collaboration. We present MARCO, a Multi-Agent Real-time Chat Orchestration framework for automating tasks using LLMs. MARCO addresses key challenges in utilizing LLMs for complex, multi-step task execution. It incorporates robust guardrails to steer LLM behavior, validate outputs, and recover from errors that stem from inconsistent output formatting, function and parameter hallucination, and lack of domain knowledge. Through extensive experiments we demonstrate MARCO's superior performance with 94.48% and 92.74% accuracy on task execution for Digital Restaurant Service Platform conversations and Retail conversations datasets respectively along with 44.91% improved latency and 33.71% cost reduction. We also report effects of guardrails in performance gain along with comparisons of various LLM models, both open-source and proprietary. The modular and generic design of MARCO allows it to be adapted for automating tasks across domains and to execute complex usecases through multi-turn interactions.
Related papers
- Improving Multi-turn Task Completion in Task-Oriented Dialog Systems via Prompt Chaining and Fine-Grained Feedback [2.246166820363412]
Task-oriented dialog (TOD) systems facilitate users in accomplishing complex, multi-turn tasks through natural language.
LLMs struggle to reliably handle multi-turn task completion.
We propose RealTOD, a novel framework that enhances TOD systems through prompt chaining and fine-grained feedback mechanisms.
arXiv Detail & Related papers (2025-02-18T21:36:19Z) - Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.
However, they still struggle with problems requiring multi-step decision-making and environmental feedback.
We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z) - MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation [52.35744453954844]
This paper introduces MMRC, a benchmark for evaluating six core open-ended abilities of MLLMs.
Evaluations on 20 MLLMs in MMRC indicate an accuracy drop during open-ended interactions.
We propose a simple yet effective NOTE-TAKING strategy, which can record key information from the conversation and remind the model during its responses.
arXiv Detail & Related papers (2025-02-17T15:24:49Z) - AgentPS: Agentic Process Supervision for Multi-modal Content Quality Assurance through Multi-round QA [9.450927573476822]
textitAgentPS is a novel framework that integrates Agentic Process Supervision into MLLMs via multi-round question answering during fine-tuning.
textitAgentPS demonstrates significant performance improvements over baseline MLLMs on proprietary TikTok datasets.
arXiv Detail & Related papers (2024-12-15T04:58:00Z) - TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action [103.5952731807559]
We present TACO, a family of multi-modal large action models designed to improve performance on complex, multi-step, and multi-modal tasks.
During inference, TACO produces chains-of-thought-and-action (CoTA), executes intermediate steps by invoking external tools such as OCR, depth estimation and calculator.
This dataset enables TACO to learn complex reasoning and action paths, surpassing existing models trained on instruction tuning data with only direct answers.
arXiv Detail & Related papers (2024-12-07T00:42:04Z) - Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning [14.635361844362794]
Smurfs' is a cutting-edge multi-agent framework designed to revolutionize the application of large language models.
Smurfs can enhance the model's ability to solve complex tasks at no additional cost.
arXiv Detail & Related papers (2024-05-09T17:49:04Z) - Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning [56.82041895921434]
Open-source pre-trained Large Language Models (LLMs) exhibit strong language understanding and generation capabilities.
When used as agents for dealing with complex problems in the real world, their performance is far inferior to large commercial models such as ChatGPT and GPT-4.
arXiv Detail & Related papers (2024-03-29T03:48:12Z) - PPTC-R benchmark: Towards Evaluating the Robustness of Large Language
Models for PowerPoint Task Completion [96.47420221442397]
We construct adversarial user instructions by attacking user instructions at sentence, semantic, and multi-language levels.
We test 3 closed-source and 4 open-source LLMs using a benchmark that incorporates robustness settings.
We find that GPT-4 exhibits the highest performance and strong robustness in our benchmark.
arXiv Detail & Related papers (2024-03-06T15:33:32Z) - TaskLAMA: Probing the Complex Task Understanding of Language Models [13.336015994186955]
Structured Complex Task Decomposition (SCTD) is a problem of breaking down a complex real-world task into a directed acyclic graph over individual steps that contribute to achieving the task.
We probe how accurately SCTD can be done with the knowledge extracted from Large Language Models (LLMs)
Our experiments reveal that LLMs are able to decompose complex tasks into individual steps effectively, with a relative improvement of 15% to 280% over the best baseline.
arXiv Detail & Related papers (2023-08-29T13:36:45Z) - AgentBench: Evaluating LLMs as Agents [88.45506148281379]
Large Language Models (LLMs) are becoming increasingly smart and autonomous, targeting real-world pragmatic missions beyond traditional NLP tasks.
We present AgentBench, a benchmark that currently consists of 8 distinct environments to assess LLM-as-Agent's reasoning and decision-making abilities.
arXiv Detail & Related papers (2023-08-07T16:08:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.