Distributionally Robust Cooperative Multi-Agent Reinforcement Learning via Robust Value Factorization
- URL: http://arxiv.org/abs/2602.11437v1
- Date: Wed, 11 Feb 2026 23:24:15 GMT
- Title: Distributionally Robust Cooperative Multi-Agent Reinforcement Learning via Robust Value Factorization
- Authors: Chengrui Qu, Christopher Yeh, Kishan Panaganti, Eric Mazumdar, Adam Wierman,
- Abstract summary: We introduce Distributionally robust IGM (DrIGM), a principle that requires each agent's robust greedy action to align with the robust team-optimal joint action.<n>DrIGM holds for a novel definition of robust individual action values, which is compatible with decentralized greedy execution.<n>We derive DrIGM-compliant robust variants of existing value-factorization architectures.
- Score: 29.92519720312025
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cooperative multi-agent reinforcement learning (MARL) commonly adopts centralized training with decentralized execution, where value-factorization methods enforce the individual-global-maximum (IGM) principle so that decentralized greedy actions recover the team-optimal joint action. However, the reliability of this recipe in real-world settings remains unreliable due to environmental uncertainties arising from the sim-to-real gap, model mismatch, and system noise. We address this gap by introducing Distributionally robust IGM (DrIGM), a principle that requires each agent's robust greedy action to align with the robust team-optimal joint action. We show that DrIGM holds for a novel definition of robust individual action values, which is compatible with decentralized greedy execution and yields a provable robustness guarantee for the whole system. Building on this foundation, we derive DrIGM-compliant robust variants of existing value-factorization architectures (e.g., VDN/QMIX/QTRAN) that (i) train on robust Q-targets, (ii) preserve scalability, and (iii) integrate seamlessly with existing codebases without bespoke per-agent reward shaping. Empirically, on high-fidelity SustainGym simulators and a StarCraft game environment, our methods consistently improve out-of-distribution performance. Code and data are available at https://github.com/crqu/robust-coMARL.
Related papers
- Adaptive Dual-Weighting Framework for Federated Learning via Out-of-Distribution Detection [53.45696787935487]
Federated Learning (FL) enables collaborative model training across large-scale distributed service nodes.<n>In real-world service-oriented deployments, data generated by heterogeneous users, devices, and application scenarios are inherently non-IID.<n>We propose FLood, a novel FL framework inspired by out-of-distribution (OOD) detection.
arXiv Detail & Related papers (2026-02-01T05:54:59Z) - Mechanism-Based Intelligence (MBI): Differentiable Incentives for Rational Coordination and Guaranteed Alignment in Multi-Agent Systems [0.0]
I introduce Mechanism-Based Intelligence (MBI), a paradigm that reconceptualizes intelligence as emergent from the coordination of multiple "brains", rather than a single one.<n>It provides a provably efficient, auditable and generalizable approach to coordinated, trustworthy and scalable multi-agent intelligence grounded in economic principles.
arXiv Detail & Related papers (2025-12-22T22:22:13Z) - Beyond Monotonicity: Revisiting Factorization Principles in Multi-Agent Q-Learning [24.476713156225685]
Value decomposition is a central approach in multi-agent reinforcement learning (MARL)<n>Existing methods either enforce monotonicity constraints, which limit expressive power, or adopt softer surrogates at the cost of algorithmic complexity.<n>We show that unconstrained, non-monotonic factorization reliably recovers IGM-optimal solutions and consistently outperforms monotonic baselines.
arXiv Detail & Related papers (2025-11-12T22:49:35Z) - Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models.<n>Controlled Decoding provides a mechanism for aligning a model at inference time without retraining.<n>We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z) - Redistributing Rewards Across Time and Agents for Multi-Agent Reinforcement Learning [14.852334980733369]
Credit assignmen, disentangling each agent's contribution to a shared reward, is a critical challenge in cooperative multi-agent reinforcement learning.<n>We introduce Temporal-Agent Reward Redistribution (TAR$2$), an approach that decouples credit modeling from this constraint.<n>We demonstrate that this method is equivalent to a valid Potential-Based Reward Shaping (PBRS), which guarantees the optimal policy is preserved regardless of model accuracy.
arXiv Detail & Related papers (2025-02-07T12:07:57Z) - Safe Multi-Agent Reinforcement Learning with Convergence to Generalized Nash Equilibrium [6.169364905804677]
Multi-agent reinforcement learning (MARL) has achieved notable success in cooperative tasks.
deploying MARL agents in real-world applications presents critical safety challenges.
We propose a novel theoretical framework for safe MARL with $textitstate-wise$ constraints, where safety requirements are enforced at every state the agents visit.
For practical deployment in complex high-dimensional systems, we propose $textitMulti-Agent Dual Actor-Critic$ (MADAC)
arXiv Detail & Related papers (2024-11-22T16:08:42Z) - QFree: A Universal Value Function Factorization for Multi-Agent
Reinforcement Learning [2.287186762346021]
We propose QFree, a universal value function factorization method for multi-agent reinforcement learning.
We show that QFree achieves the state-of-the-art performance in a general-purpose complex MARL benchmark environment.
arXiv Detail & Related papers (2023-11-01T08:07:16Z) - DQMIX: A Distributional Perspective on Multi-Agent Reinforcement
Learning [122.47938710284784]
In cooperative multi-agent tasks, a team of agents jointly interact with an environment by taking actions, receiving a reward and observing the next state.
Most of the existing value-based multi-agent reinforcement learning methods only model the expectations of individual Q-values and global Q-value.
arXiv Detail & Related papers (2022-02-21T11:28:00Z) - Convergence Rates of Average-Reward Multi-agent Reinforcement Learning
via Randomized Linear Programming [41.30044824711509]
We focus on the case that the global reward is a sum of local rewards, the joint policy factorizes into agents' marginals, and full state observability.
We develop multi-agent extensions, whereby agents solve their local saddle point problems and then perform local weighted averaging.
We establish that the sample complexity to obtain near-globally optimal solutions matches tight dependencies on the cardinality of the state and action spaces.
arXiv Detail & Related papers (2021-10-22T03:48:41Z) - Locality Matters: A Scalable Value Decomposition Approach for
Cooperative Multi-Agent Reinforcement Learning [52.7873574425376]
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents.
We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Training Decentralized Execution paradigm.
arXiv Detail & Related papers (2021-09-22T10:08:15Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z) - Monotonic Value Function Factorisation for Deep Multi-Agent
Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion.
We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.