Related papers: MALinZero: Efficient Low-Dimensional Search for Mastering Complex Multi-Agent Planning

MALinZero: Efficient Low-Dimensional Search for Mastering Complex Multi-Agent Planning

URL: http://arxiv.org/abs/2511.06142v1
Date: Sat, 08 Nov 2025 21:27:09 GMT
Title: MALinZero: Efficient Low-Dimensional Search for Mastering Complex Multi-Agent Planning
Authors: Sizhe Tang, Jiayu Chen, Tian Lan,
Abstract summary: We propose MALinZero, a new approach to leverage low-dimensional representational structures on joint-action returns.<n> MALinZero demonstrates state-of-the-art performance on multi-agent benchmarks such as matrix games, SMAC, and SMACv2.
Score: 8.28864605730277
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Monte Carlo Tree Search (MCTS), which leverages Upper Confidence Bound for Trees (UCTs) to balance exploration and exploitation through randomized sampling, is instrumental to solving complex planning problems. However, for multi-agent planning, MCTS is confronted with a large combinatorial action space that often grows exponentially with the number of agents. As a result, the branching factor of MCTS during tree expansion also increases exponentially, making it very difficult to efficiently explore and exploit during tree search. To this end, we propose MALinZero, a new approach to leverage low-dimensional representational structures on joint-action returns and enable efficient MCTS in complex multi-agent planning. Our solution can be viewed as projecting the joint-action returns into the low-dimensional space representable using a contextual linear bandit problem formulation. We solve the contextual linear bandit problem with convex and $\mu$-smooth loss functions -- in order to place more importance on better joint actions and mitigate potential representational limitations -- and derive a linear Upper Confidence Bound applied to trees (LinUCT) to enable novel multi-agent exploration and exploitation in the low-dimensional space. We analyze the regret of MALinZero for low-dimensional reward functions and propose an $(1-\tfrac1e)$-approximation algorithm for the joint action selection by maximizing a sub-modular objective. MALinZero demonstrates state-of-the-art performance on multi-agent benchmarks such as matrix games, SMAC, and SMACv2, outperforming both model-based and model-free multi-agent reinforcement learning baselines with faster learning speed and better performance.

Related papers

MASPOB: Bandit-Based Prompt Optimization for Multi-Agent Systems with Graph Neural Networks [21.211097851224487]
We introduce MASPOB (Multi-Agent System Prompt Optimization via Bandits), a novel sample-efficient framework based on bandits.<n>To handle topology-induced coupling, MASPOB integrates Graph Neural Networks (GNNs) to capture structural priors, learning topology-aware representations of prompt semantics.
arXiv Detail & Related papers (2026-03-03T05:59:05Z)
TreePS-RAG: Tree-based Process Supervision for Reinforcement Learning in Agentic RAG [71.06073770344732]
Agentic retrieval-augmented generation (RAG) formulates question answering as a multi-step interaction between reasoning and information retrieval.<n>We present TreePS-RAG, an online, tree-based RL framework for agentic RAG that enables step-wise credit assignment while retaining outcome-only rewards.
arXiv Detail & Related papers (2026-01-11T14:07:30Z)
Beyond Monolithic Architectures: A Multi-Agent Search and Knowledge Optimization Framework for Agentic Search [56.78490647843876]
Agentic search has emerged as a promising paradigm for complex information seeking by enabling Large Language Models (LLMs) to interleave reasoning with tool use.<n>We propose bfM-ASK, a framework that explicitly decouples agentic search into two complementary roles: Search Behavior Agents, which plan and execute search actions, and Knowledge Management Agents, which aggregate, filter, and maintain a compact internal context.
arXiv Detail & Related papers (2026-01-08T08:13:27Z)
Multiscale Aggregated Hierarchical Attention (MAHA): A Game Theoretic and Optimization Driven Approach to Efficient Contextual Modeling in Large Language Models [0.0]
Multiscale Aggregated Hierarchical Attention (MAHA) is a novel architectural framework that reformulates the attention mechanism through hierarchical decomposition and mathematically rigorous aggregation.<n>MAHA dynamically partitions the input sequence into hierarchical scales via learnable downsampling operators.<n> Experimental evaluations demonstrate that MAHA achieves superior scalability; empirical FLOPs analysis confirms an 81% reduction in computational cost at a sequence length of 4096 compared to standard attention.
arXiv Detail & Related papers (2025-12-16T21:27:21Z)
Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency [52.60557300927007]
We present a $textbfMA-OSMA$ algorithm to transfer the discrete submodular problem into a continuous optimization.<n>We also introduce a projection-free $textbfMA-OSEA$ algorithm, which effectively utilizes the KL divergence by mixing a uniform distribution.<n>Our algorithms significantly improve the $(frac11+c)$-approximation provided by the state-of-the-art OSG algorithm.
arXiv Detail & Related papers (2025-02-07T15:57:56Z)
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning [56.273799410256075]
The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path. The framework has been tested on general and advanced benchmarks, showing superior performance in terms of search efficiency and problem-solving capability.
arXiv Detail & Related papers (2024-10-03T18:12:29Z)
Improved Monte Carlo Planning via Causal Disentanglement for Structurally-Decomposed Markov Decision Processes [0.9768138268100163]
We introduce the Structurally Decomposed MDP (SD-MDP), which leverages causal disentanglement to partition an MDP's temporal causal graph into independent components.<n>We demonstrate superior policy performance over benchmarks across various logistics and finance domains.
arXiv Detail & Related papers (2024-06-23T16:22:40Z)
Efficient Multi-agent Reinforcement Learning by Planning [33.51282615335009]
Multi-agent reinforcement learning (MARL) algorithms have accomplished remarkable breakthroughs in solving large-scale decision-making tasks. Most existing MARL algorithms are model-free, limiting sample efficiency and hindering their applicability in more challenging scenarios. We propose the MAZero algorithm, which combines a centralized model with Monte Carlo Tree Search (MCTS) for policy search.
arXiv Detail & Related papers (2024-05-20T04:36:02Z)
Fleet of Agents: Coordinated Problem Solving with Large Language Models [10.167121757937062]
Fleet of Agents (FoA) is a principled framework utilizing large language models as agents to navigate through dynamic tree searches.<n>FoA spawns a multitude of agents, each exploring the search space autonomously, followed by a selection phase.<n>FoA achieves the best cost-quality trade-off among all benchmarked methods and FoA + LMA3.2-11B surpasses the Llama3.2-90B model.
arXiv Detail & Related papers (2024-05-07T09:36:23Z)
Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control [66.78146440275093]
Learned retrieval (LSR) is a family of neural methods that encode queries and documents into sparse lexical vectors. We explore the application of LSR to the multi-modal domain, with a focus on text-image retrieval. Current approaches like LexLIP and STAIR require complex multi-step training on massive datasets. Our proposed approach efficiently transforms dense vectors from a frozen dense model into sparse lexical vectors.
arXiv Detail & Related papers (2024-02-27T14:21:56Z)
Distributed Consensus Algorithm for Decision-Making in Multi-agent Multi-armed Bandit [7.708904950194129]
We study a structured multi-agent multi-armed bandit (MAMAB) problem in a dynamic environment. A graph reflects the information-sharing structure among agents, and the arms' reward distributions are piecewise-stationary with several unknown change points. The goal is to develop a decision-making policy for the agents that minimizes the regret, which is the expected total loss of not playing the optimal arm at each time step.
arXiv Detail & Related papers (2023-06-09T16:10:26Z)
Multi-Agent Reinforcement Learning for Adaptive Mesh Refinement [17.72127385405445]
We present a novel formulation of adaptive mesh refinement (AMR) as a fully-cooperative Markov game. We design a novel deep multi-agent reinforcement learning algorithm called Value Decomposition Graph Network (VDGN) We show that VDGN policies significantly outperform error threshold-based policies in global error and cost metrics.
arXiv Detail & Related papers (2022-11-02T00:41:32Z)
Reinforcement Learning for Branch-and-Bound Optimisation using Retrospective Trajectories [72.15369769265398]
Machine learning has emerged as a promising paradigm for branching. We propose retro branching; a simple yet effective approach to RL for branching. We outperform the current state-of-the-art RL branching algorithm by 3-5x and come within 20% of the best IL method's performance on MILPs with 500 constraints and 1000 variables.
arXiv Detail & Related papers (2022-05-28T06:08:07Z)
Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems. Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC. We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.