Related papers: ROMA: Recursive Open Meta-Agent Framework for Long-Horizon Multi-Agent Systems

ROMA: Recursive Open Meta-Agent Framework for Long-Horizon Multi-Agent Systems

URL: http://arxiv.org/abs/2602.01848v1
Date: Mon, 02 Feb 2026 09:20:59 GMT
Title: ROMA: Recursive Open Meta-Agent Framework for Long-Horizon Multi-Agent Systems
Authors: Salaheddin Alzu'bi, Baran Nama, Arda Kaz, Anushri Eswaran, Weiyuan Chen, Sarvesh Khetan, Rishab Bala, Tu Vu, Sewoong Oh,
Abstract summary: Current agentic frameworks underperform on long-horizon tasks.<n>We introduce ROMA, a domain-agnostic framework that addresses these limitations.<n>We show that ROMA, combined with GEPA+, delivers leading system-level performance on reasoning and long-form generation benchmarks.
Score: 25.131570054560353
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Current agentic frameworks underperform on long-horizon tasks. As reasoning depth increases, sequential orchestration becomes brittle, context windows impose hard limits that degrade performance, and opaque execution traces make failures difficult to localize or debug. We introduce ROMA (Recursive Open Meta-Agents), a domain-agnostic framework that addresses these limitations through recursive task decomposition and structured aggregation. ROMA decomposes goals into dependency-aware subtask trees that can be executed in parallel, while aggregation compresses and validates intermediate results to control context growth. Our framework standardizes agent construction around four modular roles --Atomizer (which decides whether a task should be decomposed), Planner, Executor, and Aggregator -- which cleanly separate orchestration from model selection and enable transparent, hierarchical execution traces. This design supports heterogeneous multi-agent systems that mix models and tools according to cost, latency, and capability. To adapt ROMA to specific tasks without fine-tuning, we further introduce GEPA$+$, an improved Genetic-Pareto prompt proposer that searches over prompts within ROMA's component hierarchy while preserving interface contracts. We show that ROMA, combined with GEPA+, delivers leading system-level performance on reasoning and long-form generation benchmarks. On SEAL-0, which evaluates reasoning over conflicting web evidence, ROMA instantiated with GLM-4.6 improves accuracy by 9.9\% over Kimi-Researcher. On EQ-Bench, a long-form writing benchmark, ROMA enables DeepSeek-V3 to match the performance of leading closed-source models such as Claude Sonnet 4.5. Our results demonstrate that recursive, modular agent architectures can scale reasoning depth while remaining interpretable, flexible, and model-agnostic.

Related papers

AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration [36.04766562662082]
AOrchestra is a framework-agnostic agent abstraction that models any agent as a Instruction, Context Tools, Model.<n>It curates task-relevant context, tools and models, and delegates execution via on-the-fly automatic agent creation.<n>AOrchestra achieves 16.28% relative improvement against the strongest baseline when paired with Gemini.
arXiv Detail & Related papers (2026-02-03T17:46:16Z)
DDL2PropBank Agent: Benchmarking Multi-Agent Frameworks' Developer Experience Through a Novel Relational Schema Mapping Task [9.51787137194505]
DDL2PropBank is a novel benchmark task that maps relational database schemas to PropBank rolesets.<n>We implement identical agent logic across 10 frameworks and evaluate along two dimensions: (i) code complexity via static analysis, and (ii) AI-assistability.<n>Our results reveal a threefold complexity spectrum, with Pydantic AI and Agno requiring the least implementation overhead.
arXiv Detail & Related papers (2026-02-03T01:10:59Z)
AIConfigurator: Lightning-Fast Configuration Optimization for Multi-Framework LLM Serving [16.664502126572856]
AIConfigurator is a unified performance-modeling system for Large Language Model (LLM) inference.<n>It enables rapid, framework-a configuration search without requiring GPU-based profiling.<n>It identifies superior serving configurations that improve performance by up to 40% for dense models.
arXiv Detail & Related papers (2026-01-09T20:03:57Z)
Beyond Monolithic Architectures: A Multi-Agent Search and Knowledge Optimization Framework for Agentic Search [56.78490647843876]
Agentic search has emerged as a promising paradigm for complex information seeking by enabling Large Language Models (LLMs) to interleave reasoning with tool use.<n>We propose bfM-ASK, a framework that explicitly decouples agentic search into two complementary roles: Search Behavior Agents, which plan and execute search actions, and Knowledge Management Agents, which aggregate, filter, and maintain a compact internal context.
arXiv Detail & Related papers (2026-01-08T08:13:27Z)
Monadic Context Engineering [59.95390010097654]
This paper introduces Monadic Context Engineering (MCE) to provide a formal foundation for agent design.<n>We demonstrate how Monads enable robust composition, how Applicatives provide a principled structure for parallel execution, and crucially, how Monad Transformers allow for the systematic composition of these capabilities.<n>This layered approach enables developers to construct complex, resilient, and efficient AI agents from simple, independently verifiable components.
arXiv Detail & Related papers (2025-12-27T01:52:06Z)
Resolving Evidence Sparsity: Agentic Context Engineering for Long-Document Understanding [49.26132236798123]
Vision Language Models (VLMs) have gradually become a primary approach in document understanding.<n>We propose SLEUTH, a multi agent framework that orchestrates a retriever and four collaborative agents in a coarse to fine process.<n>The framework identifies key textual and visual clues within the retrieved pages, filters for salient visual evidence such as tables and charts, and analyzes the query to devise a reasoning strategy.
arXiv Detail & Related papers (2025-11-28T03:09:40Z)
AgentGit: A Version Control Framework for Reliable and Scalable LLM-Powered Multi-Agent Systems [7.408263799616532]
We present AgentGit, a framework that brings Git-like rollback and branching to multi-agent systems (MAS)<n>We show that AgentGit significantly reduces redundant, runtime and token usage, and supports parallel exploration across multiple branches.<n>This work offers a practical path to more robust MAS design and enables error recovery, safe exploration, computation, and A/B testing in collaborative AI systems.
arXiv Detail & Related papers (2025-11-01T17:11:31Z)
MSC-Bench: A Rigorous Benchmark for Multi-Server Tool Orchestration [0.0]
MSC-Bench is a large-scale benchmark for evaluating multi-hop, end-to-end tool orchestration by LLM agents.<n>It addresses gaps by constructing ground truth through 'equal function sets', allowing objective metrics such as F1 score.<n>It systematically tests agent capabilities from single-tool orchestration to complex cross-server planning, and robustness to out-of-scope requests.
arXiv Detail & Related papers (2025-10-22T09:45:11Z)
AGENTIQL: An Agent-Inspired Multi-Expert Framework for Text-to-SQL Generation [0.509780930114934]
AGENTIQL is an agent-inspired framework that combines a reasoning agent for question decomposition, a coding agent for sub-query generation, and a refinement step for column selection.<n>We evaluate AGENTIQL on the Spider benchmark, achieving up to 86.07% EX with 14B models using the Planner&Executor merging strategy.<n>Beyond accuracy, AGENTIQL enhances transparency by exposing intermediate reasoning steps, offering a robust, scalable, and interpretable approach to semantic parsing.
arXiv Detail & Related papers (2025-10-12T15:35:05Z)
Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution [48.7788770680643]
Flash-Searcher is a novel parallel agent reasoning framework.<n>It decomposes complex tasks into subtasks with explicit dependencies, enabling concurrent execution of independent reasoning paths.<n>It achieves 67.7% accuracy on BrowseComp and 83% on xbench-DeepSearch, while reducing agent execution steps by up to 35% compared to current frameworks.
arXiv Detail & Related papers (2025-09-29T17:39:30Z)
Maestro: Joint Graph & Config Optimization for Reliable AI Agents [53.71882250666667]
Maestro is a holistic-agnostic framework for LLM agents that jointly searches over graphs and configurations to maximize agent quality.<n>On the IFBench and HotpotQA benchmarks, Maestro consistently surpasses leading prompts--MIPROv2, GEPA, and GEPA+--by an average of 12%--4.9%, and 4.86%, respectively.
arXiv Detail & Related papers (2025-09-04T20:00:37Z)
Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling [83.78874399606379]
We propose MACT, a Multi-Agent Collaboration framework with Test-Time scaling.<n>It comprises four distinct small-scale agents, with clearly defined roles and effective collaboration.<n>It shows superior performance with a smaller parameter scale without sacrificing the ability of general and mathematical tasks.
arXiv Detail & Related papers (2025-08-05T12:52:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.