Related papers: API: Boosting Multi-Agent Reinforcement Learning via Agent-Permutation-Invariant Networks

API: Boosting Multi-Agent Reinforcement Learning via Agent-Permutation-Invariant Networks

URL: http://arxiv.org/abs/2203.05285v1
Date: Thu, 10 Mar 2022 11:00:53 GMT
Title: API: Boosting Multi-Agent Reinforcement Learning via Agent-Permutation-Invariant Networks
Authors: Xiaotian Hao, Weixun Wang, Hangyu Mao, Yaodong Yang, Dong Li, Yan Zheng, Zhen Wang, Jianye Hao
Abstract summary: Multi-agent reinforcement learning suffers from poor sample efficiency due to the exponential growth of the state-action space. We propose two novel designs to achieve permutation invariant (PI) The first design permutes the same but differently ordered inputs back to the same order and the downstream networks only need to learn function mapping over fixed-ordering inputs.
Score: 35.63476630248861
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Multi-agent reinforcement learning suffers from poor sample efficiency due to the exponential growth of the state-action space. Considering a homogeneous multiagent system, a global state consisting of $m$ homogeneous components has $m!$ differently ordered representations, thus designing functions satisfying permutation invariant (PI) can reduce the state space by a factor of $\frac{1}{m!}$. However, mainstream MARL algorithms ignore this property and learn over the original state space. To achieve PI, previous works including data augmentation based methods and embedding-sharing architecture based methods, suffer from training instability and limited model capacity. In this work, we propose two novel designs to achieve PI, while avoiding the above limitations. The first design permutes the same but differently ordered inputs back to the same order and the downstream networks only need to learn function mapping over fixed-ordering inputs instead of all permutations, which is much easier to train. The second design applies a hypernetwork to generate customized embedding for each component, which has higher representational capacity than the previous embedding-sharing method. Empirical results on the SMAC benchmark show that the proposed method achieves 100% win-rates in almost all hard and super-hard scenarios (never achieved before), and superior sample-efficiency than the state-of-the-art baselines by up to 400%.

Related papers

MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on Large Language Models [53.36415620647177]
Semi-structured sparsity offers a promising solution by strategically retaining $N$ elements out of every $M$ weights.<n>Existing (N:M)-compatible approaches typically fall into two categories: rule-based layerwise greedy search, which suffers from considerable errors, and gradient-driven learning, which incurs prohibitive training costs.<n>We propose a novel linear-space probabilistic framework named MaskPro, which aims to learn a prior categorical distribution for every $M$ consecutive weights and subsequently leverages this distribution to generate the (N:M)-sparsity throughout an $N$-way sampling
arXiv Detail & Related papers (2025-06-15T15:02:59Z)
Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency [52.60557300927007]
We present a $textbfMA-OSMA$ algorithm to transfer the discrete submodular problem into a continuous optimization.<n>We also introduce a projection-free $textbfMA-OSEA$ algorithm, which effectively utilizes the KL divergence by mixing a uniform distribution.<n>Our algorithms significantly improve the $(frac11+c)$-approximation provided by the state-of-the-art OSG algorithm.
arXiv Detail & Related papers (2025-02-07T15:57:56Z)
Scalable Multi-agent Covering Option Discovery based on Kronecker Graphs [49.71319907864573]
In this paper, we propose multi-agent skill discovery which enables the ease of decomposition. Our key idea is to approximate the joint state space as a Kronecker graph, based on which we can directly estimate its Fiedler vector. Considering that directly computing the Laplacian spectrum is intractable for tasks with infinite-scale state spaces, we further propose a deep learning extension of our method.
arXiv Detail & Related papers (2023-07-21T14:53:12Z)
Cooperative Thresholded Lasso for Sparse Linear Bandit [6.52540785559241]
We present a novel approach to address the multi-agent sparse contextual linear bandit problem. It is first algorithm that tackles row-wise distributed data in sparse linear bandits. It is widely applicable to high-dimensional multi-agent problems where efficient feature extraction is critical for minimizing regret.
arXiv Detail & Related papers (2023-05-30T16:05:44Z)
PIER: Permutation-Level Interest-Based End-to-End Re-ranking Framework in E-commerce [13.885695433738437]
Existing re-ranking methods directly take the initial ranking list as input, and generate the optimal permutation through a well-designed context-wise model. evaluating all candidate permutations brings unacceptable computational costs in practice. This paper presents a novel end-to-end re-ranking framework named PIER to tackle the above challenges.
arXiv Detail & Related papers (2023-02-06T09:17:52Z)
Federated Learning Using Variance Reduced Stochastic Gradient for Probabilistically Activated Agents [0.0]
This paper proposes an algorithm for Federated Learning (FL) with a two-layer structure that achieves both variance reduction and a faster convergence rate to an optimal solution in the setting where each agent has an arbitrary probability of selection in each iteration.
arXiv Detail & Related papers (2022-10-25T22:04:49Z)
Cluster and Aggregate: Face Recognition with Large Probe Set [18.662943303044315]
We propose a two-stage feature fusion paradigm, Cluster and Aggregate, that can both scale to large $N$ and maintain the ability to perform sequential inference with order invariance. Experiments on IJB-B and IJB-S benchmark datasets show the superiority of the proposed two-stage paradigm in unconstrained face recognition.
arXiv Detail & Related papers (2022-10-19T20:01:15Z)
Combating Mode Collapse in GANs via Manifold Entropy Estimation [70.06639443446545]
Generative Adversarial Networks (GANs) have shown compelling results in various tasks and applications. We propose a novel training pipeline to address the mode collapse issue of GANs.
arXiv Detail & Related papers (2022-08-25T12:33:31Z)
Dynamic Prototype Mask for Occluded Person Re-Identification [88.7782299372656]
Existing methods mainly address this issue by employing body clues provided by an extra network to distinguish the visible part. We propose a novel Dynamic Prototype Mask (DPM) based on two self-evident prior knowledge. Under this condition, the occluded representation could be well aligned in a selected subspace spontaneously.
arXiv Detail & Related papers (2022-07-19T03:31:13Z)
Recomposing the Reinforcement Learning Building Blocks with Hypernetworks [19.523737925041278]
We show that a primary network determines the weights of a conditional dynamic network. This approach improves the gradient approximation and reduces the learning step variance. We demonstrate a consistent improvement across different locomotion tasks and different algorithms both in RL (TD3 and SAC) and in Meta-RL (MAML and PEARL)
arXiv Detail & Related papers (2021-06-12T19:43:12Z)
Exploiting Invariance in Training Deep Neural Networks [4.169130102668252]
Inspired by two basic mechanisms in animal visual systems, we introduce a feature transform technique that imposes invariance properties in the training of deep neural networks. The resulting algorithm requires less parameter tuning, trains well with an initial learning rate 1.0, and easily generalizes to different tasks. Tested on ImageNet, MS COCO, and Cityscapes datasets, our proposed technique requires fewer iterations to train, surpasses all baselines by a large margin, seamlessly works on both small and large batch size training, and applies to different computer vision tasks of image classification, object detection, and semantic segmentation.
arXiv Detail & Related papers (2021-03-30T19:18:31Z)
A Fast Graph Neural Network-Based Method for Winner Determination in Multi-Unit Combinatorial Auctions [44.14410999484577]
Auction (CA) is an efficient mechanism for resource allocation in different fields, including cloud computing. The problem of allocating items among the bidders to maximize the auctioneers" revenue is NP-complete to solve and inapproximable. We propose leveraging machine learning (ML) techniques to develop a novel low-complexity algorithm for solving this problem with negligible revenue loss.
arXiv Detail & Related papers (2020-09-29T00:22:37Z)
Improving Robustness and Generality of NLP Models Using Disentangled Representations [62.08794500431367]
Supervised neural networks first map an input $x$ to a single representation $z$, and then map $z$ to the output label $y$. We present methods to improve robustness and generality of NLP models from the standpoint of disentangled representation learning. We show that models trained with the proposed criteria provide better robustness and domain adaptation ability in a wide range of supervised learning tasks.
arXiv Detail & Related papers (2020-09-21T02:48:46Z)
Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity [67.02490430380415]
We show that model-based MARL achieves a sample complexity of $tilde O(|S||B|(gamma)-3epsilon-2)$ for finding the Nash equilibrium (NE) value up to some $epsilon$ error. We also show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge.
arXiv Detail & Related papers (2020-07-15T03:25:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.