Related papers: Maestro: Joint Graph & Config Optimization for Reliable AI Agents

Maestro: Joint Graph & Config Optimization for Reliable AI Agents

URL: http://arxiv.org/abs/2509.04642v1
Date: Thu, 04 Sep 2025 20:00:37 GMT
Title: Maestro: Joint Graph & Config Optimization for Reliable AI Agents
Authors: Wenxiao Wang, Priyatham Kattakinda, Soheil Feizi,
Abstract summary: Maestro is a holistic-agnostic framework for LLM agents that jointly searches over graphs and configurations to maximize agent quality.<n>On the IFBench and HotpotQA benchmarks, Maestro consistently surpasses leading prompts--MIPROv2, GEPA, and GEPA+--by an average of 12%--4.9%, and 4.86%, respectively.
Score: 53.71882250666667
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Building reliable LLM agents requires decisions at two levels: the graph (which modules exist and how information flows) and the configuration of each node (models, prompts, tools, control knobs). Most existing optimizers tune configurations while holding the graph fixed, leaving structural failure modes unaddressed. We introduce Maestro, a framework-agnostic holistic optimizer for LLM agents that jointly searches over graphs and configurations to maximize agent quality, subject to explicit rollout/token budgets. Beyond numeric metrics, Maestro leverages reflective textual feedback from traces to prioritize edits, improving sample efficiency and targeting specific failure modes. On the IFBench and HotpotQA benchmarks, Maestro consistently surpasses leading prompt optimizers--MIPROv2, GEPA, and GEPA+Merge--by an average of 12%, 4.9%, and 4.86%, respectively; even when restricted to prompt-only optimization, it still leads by 9.65%, 2.37%, and 2.41%. Maestro achieves these results with far fewer rollouts than GEPA. We further show large gains on two applications (interviewer & RAG agents), highlighting that joint graph & configuration search addresses structural failure modes that prompt tuning alone cannot fix.

Related papers

ROMA: Recursive Open Meta-Agent Framework for Long-Horizon Multi-Agent Systems [25.131570054560353]
Current agentic frameworks underperform on long-horizon tasks.<n>We introduce ROMA, a domain-agnostic framework that addresses these limitations.<n>We show that ROMA, combined with GEPA+, delivers leading system-level performance on reasoning and long-form generation benchmarks.
arXiv Detail & Related papers (2026-02-02T09:20:59Z)
TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework [62.66056331998838]
TeaRAG is a token-efficient agentic RAG framework capable of compressing both retrieval content and reasoning steps.<n>Our reward function evaluates the knowledge sufficiency by a knowledge matching mechanism, while penalizing excessive reasoning steps.
arXiv Detail & Related papers (2025-11-07T16:08:34Z)
Scaling Graph Chain-of-Thought Reasoning: A Multi-Agent Framework with Efficient LLM Serving [38.059017394879284]
Graph Chain-of-Thought (Graph-CoT) enables large language models (LLMs) to perform step-by-step reasoning over graph-structured knowledge.<n>Existing pipelines suffer from low accuracy, excessive token usage, high latency, and low throughput.<n>We present GLM, the first multi-agent Graph-CoT system co-designed with an optimized LLM serving architecture.
arXiv Detail & Related papers (2025-11-03T14:42:53Z)
GraphCogent: Overcoming LLMs' Working Memory Constraints via Multi-Agent Collaboration in Complex Graph Understanding [8.297882768573427]
Large language models (LLMs) show promising performance on small-scale graph reasoning tasks but fail when handling real-world graphs with complex queries.<n>We propose GraphCogent, a collaborative agent framework that decomposes graph reasoning into specialized cognitive processes: sense, buffer, and execute.
arXiv Detail & Related papers (2025-08-17T14:28:38Z)
Graph Counselor: Adaptive Graph Exploration via Multi-Agent Synergy to Enhance LLM Reasoning [4.703280619961521]
GraphRAG effectively enhances external knowledge integration capabilities by explicitly modeling knowledge relationships.<n>Existing methods suffer from two inherent limitations.<n>We propose Graph Counselor, an GraphRAG method based on multi-agent collaboration.
arXiv Detail & Related papers (2025-06-04T13:31:21Z)
MAS-ZERO: Designing Multi-Agent Systems with Zero Supervision [76.42361936804313]
We introduce MAS-ZERO, the first self-evolved, inference-time framework for automatic MAS design.<n> MAS-ZERO employs meta-level design to iteratively generate, evaluate, and refine MAS configurations tailored to each problem instance.
arXiv Detail & Related papers (2025-05-21T00:56:09Z)
Why Do Multi-Agent LLM Systems Fail? [87.90075668488434]
We introduce MAST-Data, a comprehensive dataset of 1600+ annotated traces collected across 7 popular MAS frameworks.<n>We build the first Multi-Agent System Failure taxonomy (MAST)<n>We leverage MAST and MAST-Data to analyze failure patterns across models (GPT4, Claude 3, Qwen2.5, CodeLlama) and tasks (coding, math, general agent)
arXiv Detail & Related papers (2025-03-17T19:04:38Z)
Scalable and Accurate Graph Reasoning with LLM-based Multi-Agents [27.4884498301785]
We introduce GraphAgent-Reasoner, a fine-tuning-free framework for explicit and precise graph reasoning. Inspired by distributed graph computation theory, our framework decomposes graph problems into smaller, node-centric tasks that are distributed among multiple agents. Our framework demonstrates the capability to handle real-world graph reasoning applications such as webpage importance analysis.
arXiv Detail & Related papers (2024-10-07T15:34:14Z)
Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models [88.4320775961431]
We introduce ProGraph, a benchmark for large language models (LLMs) to process graphs.<n>Our findings reveal that the performance of current LLMs is unsatisfactory, with the best model achieving only 36% accuracy.<n>We propose LLM4Graph datasets, which include crawled documents and auto-generated codes based on 6 widely used graph libraries.
arXiv Detail & Related papers (2024-09-29T11:38:45Z)
Controllable Prompt Tuning For Balancing Group Distributional Robustness [53.336515056479705]
We introduce an optimization scheme to achieve good performance across groups and find a good solution for all without severely sacrificing performance on any of them. We propose Controllable Prompt Tuning (CPT), which couples our approach with prompt-tuning techniques. On spurious correlation benchmarks, our procedures achieve state-of-the-art results across both transformer and non-transformer architectures, as well as unimodal and multimodal data.
arXiv Detail & Related papers (2024-03-05T06:23:55Z)
Language Agents as Optimizable Graphs [31.220547147952278]
We describe Large Language Models (LLMs)-based agents as computational graphs. Our framework can be used to efficiently develop, integrate, and automatically improve various LLM agents.
arXiv Detail & Related papers (2024-02-26T18:48:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.