Related papers: Scalable Reinforcement Learning Policies for Multi-Agent Control

Scalable Reinforcement Learning Policies for Multi-Agent Control

URL: http://arxiv.org/abs/2011.08055v4
Date: Wed, 10 Nov 2021 14:42:22 GMT
Title: Scalable Reinforcement Learning Policies for Multi-Agent Control
Authors: Christopher D. Hsu, Heejin Jeong, George J. Pappas, and Pratik Chaudhari
Abstract summary: We develop a Multi-Agent Reinforcement Learning (MARL) method to learn scalable control policies for target tracking. We show results for tasks consisting of up to 1000 pursuers tracking 1000 targets.
Score: 29.42370205354368
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We develop a Multi-Agent Reinforcement Learning (MARL) method to learn scalable control policies for target tracking. Our method can handle an arbitrary number of pursuers and targets; we show results for tasks consisting up to 1000 pursuers tracking 1000 targets. We use a decentralized, partially-observable Markov Decision Process framework to model pursuers as agents receiving partial observations (range and bearing) about targets which move using fixed, unknown policies. An attention mechanism is used to parameterize the value function of the agents; this mechanism allows us to handle an arbitrary number of targets. Entropy-regularized off-policy RL methods are used to train a stochastic policy, and we discuss how it enables a hedging behavior between pursuers that leads to a weak form of cooperation in spite of completely decentralized control execution. We further develop a masking heuristic that allows training on smaller problems with few pursuers-targets and execution on much larger problems. Thorough simulation experiments, ablation studies, and comparisons to state of the art algorithms are performed to study the scalability of the approach and robustness of performance to varying numbers of agents and targets.

Related papers

MO-MIX: Multi-Objective Multi-Agent Cooperative Decision-Making With Deep Reinforcement Learning [68.91090643731987]
Deep reinforcement learning (RL) has been applied extensively to solve complex decision-making problems.<n>Existing approaches are limited to separate fields and can only handle multi-agent decision-making with a single objective.<n>We propose MO-mix to solve the multi-objective multi-agent reinforcement learning (MOMARL) problem.
arXiv Detail & Related papers (2026-02-28T16:25:22Z)
Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models. Controlled Decoding provides a mechanism for aligning a model at inference time without retraining. We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z)
Leveraging Large Language Models for Effective and Explainable Multi-Agent Credit Assignment [4.406086834602686]
We show how to reformulate credit assignment to the two pattern recognition problems of sequence improvement and attribution. Our approach utilizes a centralized reward-critic which numerically decomposes the environment reward based on the individual contribution of each agent. Both our methods far outperform the state-of-the-art on a variety of benchmarks, including Level-Based Foraging, Robotic Warehouse, and our new Spaceworld benchmark which incorporates collision-related safety constraints.
arXiv Detail & Related papers (2025-02-24T05:56:47Z)
From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process. We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
Inverse Reinforcement Learning from Non-Stationary Learning Agents [11.203097744443898]
We study an inverse reinforcement learning problem that involves learning the reward function of a learning agent using trajectory data collected while this agent is learning its optimal policy. We propose an inverse reinforcement learning method that allows us to estimate the policy parameters of the learning agent which can then be used to estimate its reward function.
arXiv Detail & Related papers (2024-10-18T03:02:44Z)
MADiff: Offline Multi-agent Learning with Diffusion Models [79.18130544233794]
MADiff is a diffusion-based multi-agent learning framework. It works as both a decentralized policy and a centralized controller. Our experiments demonstrate that MADiff outperforms baseline algorithms across various multi-agent learning tasks.
arXiv Detail & Related papers (2023-05-27T02:14:09Z)
Emergent Behaviors in Multi-Agent Target Acquisition [0.0]
We simulate a Multi-Agent System (MAS) using Reinforcement Learning (RL) in a pursuit-evasion game. We create different adversarial scenarios by replacing RL-trained pursuers' policies with two distinct (non-RL) analytical strategies. The novelty of our approach entails the creation of an influential feature set that reveals underlying data regularities.
arXiv Detail & Related papers (2022-12-15T15:20:58Z)
Macro-Action-Based Multi-Agent/Robot Deep Reinforcement Learning under Partial Observability [4.111899441919164]
State-of-the-art multi-agent reinforcement learning (MARL) methods have provided promising solutions to a variety of complex problems. We first propose a group of value-based RL approaches for MacDec-POMDPs. We formulate a set of macro-action-based policy gradient algorithms under the three training paradigms.
arXiv Detail & Related papers (2022-09-20T21:13:51Z)
RACA: Relation-Aware Credit Assignment for Ad-Hoc Cooperation in Multi-Agent Deep Reinforcement Learning [55.55009081609396]
We propose a novel method, called Relation-Aware Credit Assignment (RACA), which achieves zero-shot generalization in ad-hoc cooperation scenarios. RACA takes advantage of a graph-based encoder relation to encode the topological structure between agents. Our method outperforms baseline methods on the StarCraftII micromanagement benchmark and ad-hoc cooperation scenarios.
arXiv Detail & Related papers (2022-06-02T03:39:27Z)
Relative Distributed Formation and Obstacle Avoidance with Multi-agent Reinforcement Learning [20.401609420707867]
We propose a distributed formation and obstacle avoidance method based on multi-agent reinforcement learning (MARL) Our method achieves better performance regarding formation error, formation convergence rate and on-par success rate of obstacle avoidance compared with baselines.
arXiv Detail & Related papers (2021-11-14T13:02:45Z)
Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning [131.1852444489217]
This paper presents Object-aware REgularizatiOn (OREO), a technique that regularizes an imitation policy in an object-aware manner. Our main idea is to encourage a policy to uniformly attend to all semantic objects, in order to prevent the policy from exploiting nuisance variables strongly correlated with expert actions.
arXiv Detail & Related papers (2021-10-27T01:56:23Z)
Scalable, Decentralized Multi-Agent Reinforcement Learning Methods Inspired by Stigmergy and Ant Colonies [0.0]
We investigate a novel approach to decentralized multi-agent learning and planning. In particular, this method is inspired by the cohesion, coordination, and behavior of ant colonies. The approach combines single-agent RL and an ant-colony-inspired decentralized, stigmergic algorithm for multi-agent path planning and environment modification.
arXiv Detail & Related papers (2021-05-08T01:04:51Z)
A Hamiltonian Monte Carlo Method for Probabilistic Adversarial Attack and Learning [122.49765136434353]
We present an effective method, called Hamiltonian Monte Carlo with Accumulated Momentum (HMCAM), aiming to generate a sequence of adversarial examples. We also propose a new generative method called Contrastive Adversarial Training (CAT), which approaches equilibrium distribution of adversarial examples. Both quantitative and qualitative analysis on several natural image datasets and practical systems have confirmed the superiority of the proposed algorithm.
arXiv Detail & Related papers (2020-10-15T16:07:26Z)
Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors. In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency. We propose setting up an automatic curriculum for goals that the agent needs to solve. We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z)
Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension. We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation. These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.