FuguReport

Recursive Multi-Agent Systems

Authors Xiyuan Yang, Jiaru Zou, Rui Pan, Ruizhong Qiu, Pan Lu, Shizhe Diao, Jindong Jiang, Hanghang Tong, Tong Zhang, Markus J. Buehler, Jingrui He, James Zou
Affiliations University of Illinois Urbana-Champaign / Stanford University / NVIDIA / Massachusetts Institute of Technology
Categories Method / Multi-Agent Systems / Collaborative heterogeneous agents, Theory / Learning Dynamics / Theoretical runtime and training analysis, Application / Multi-Agent Cooperation / Efficient coordination in agent loops
License CC BY 4.0

Abstract Overview

This paper introduces RecursiveMAS, a multi-agent framework that connects heterogeneous LLM agents into a recursive loop operating in latent space rather than through text exchanges. The framework uses a lightweight RecursiveLink module with an inner link for latent-thought generation within each agent and an outer link for transferring hidden representations across agents of different types and sizes. The system is trained with a two-stage inner-outer loop procedure: the inner loop warm-starts each agent's latent generation capability, while the outer loop optimizes cross-agent coordination over multiple recursion rounds via shared gradient-based credit assignment. Theoretical analyses of runtime complexity and gradient stability are provided. The framework is evaluated across four collaboration patterns and nine benchmarks spanning mathematics, science, medicine, search, and code generation.

Novelty

The primary novelty is extending recursive latent-space computation from a single language model to an entire heterogeneous multi-agent system, treating the full agent loop as a unified recursive computation. The RecursiveLink mechanism and inner-outer loop training scheme enable cross-agent latent coordination without intermediate text decoding, which the authors describe as the first attempt to apply recursive scaling at the system level.

Results

RecursiveMAS reports an average accuracy improvement of 8.3% over the strongest baseline on each benchmark (including AIME2025, AIME2026, GPQA-Diamond, MATH500, MedQA, and LiveCodeBench), while delivering 1.2× to 2.4× end-to-end inference speedup and 34.6% to 75.6% token reduction compared with text-based recursive multi-agent interaction. Training only the lightweight RecursiveLink modules (13.12M parameters, 0.31% of total) achieves better accuracy than LoRA or full supervised fine-tuning at lower GPU memory and estimated cost.

Key Points

  1. RecursiveMAS connects heterogeneous agents in a recurrent latent-space loop using lightweight inner and outer RecursiveLink modules, with only the link parameters (0.31% of total) trained while all LLM agent parameters remain frozen.
  2. Theoretical analysis shows that latent-space recursion replaces expensive per-step vocabulary-space decoding (cost proportional to vocabulary size |V|) with cheaper latent transformations, and maintains near-constant gradient norms during backpropagation compared to gradient vanishing in text-mediated recursion.
  3. Across nine benchmarks and four collaboration patterns (sequential, mixture, distillation, deliberation), RecursiveMAS achieves an average 8.3% accuracy improvement over the strongest baselines while reducing inference time by up to 2.4× and token usage by up to 75.6%.

References

This page was created using generative AI such as GPT-5, Claude Opus 4, Gemini 3, Gemini 3.1 Flash Image, and their higher-end successor versions. No guarantee can be made regarding its contents.