Related papers: Learning to Switch Among Agents in a Team via 2-Layer Markov Decision Processes

Learning to Switch Among Agents in a Team via 2-Layer Markov Decision Processes

URL: http://arxiv.org/abs/2002.04258v3
Date: Fri, 30 Jun 2023 19:09:17 GMT
Title: Learning to Switch Among Agents in a Team via 2-Layer Markov Decision Processes
Authors: Vahid Balazadeh and Abir De and Adish Singla and Manuel Gomez-Rodriguez
Abstract summary: We develop algorithms that, by learning to switch control between agents, allow existing reinforcement learning agents to operate under different automation levels. The total regret of our algorithm with respect to the optimal switching policy is sublinear in the number of learning steps. Simulation experiments in an obstacle avoidance task illustrate our theoretical findings and demonstrate that, by exploiting the specific structure of the problem, our proposed algorithm is superior to problem-agnostic algorithms.
Score: 41.04897149364321
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning agents have been mostly developed and evaluated under the assumption that they will operate in a fully autonomous manner -- they will take all actions. In this work, our goal is to develop algorithms that, by learning to switch control between agents, allow existing reinforcement learning agents to operate under different automation levels. To this end, we first formally define the problem of learning to switch control among agents in a team via a 2-layer Markov decision process. Then, we develop an online learning algorithm that uses upper confidence bounds on the agents' policies and the environment's transition probabilities to find a sequence of switching policies. The total regret of our algorithm with respect to the optimal switching policy is sublinear in the number of learning steps and, whenever multiple teams of agents operate in a similar environment, our algorithm greatly benefits from maintaining shared confidence bounds for the environments' transition probabilities and it enjoys a better regret bound than problem-agnostic algorithms. Simulation experiments in an obstacle avoidance task illustrate our theoretical findings and demonstrate that, by exploiting the specific structure of the problem, our proposed algorithm is superior to problem-agnostic algorithms.

Related papers

A Multiagent Path Search Algorithm for Large-Scale Coalition Structure Generation [61.08720171136229]
Coalition structure generation is a fundamental computational problem in multiagent systems. We develop SALDAE, a multiagent path finding algorithm for CSG that operates on a graph of coalition structures.
arXiv Detail & Related papers (2025-02-14T15:21:27Z)
From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process. We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
Multi-Agent Bandit Learning through Heterogeneous Action Erasure Channels [21.860440468189044]
Multi-Armed Bandit (MAB) systems are witnessing an upswing in applications within multi-agent distributed environments. In such settings, communication between agents executing actions and the primary learner making decisions can hinder the learning process. We introduce novel algorithms that enable learners to interact concurrently with distributed agents across heterogeneous action erasure channels.
arXiv Detail & Related papers (2023-12-21T19:21:19Z)
On the Complexity of Multi-Agent Decision Making: From Learning in Games to Partial Monitoring [105.13668993076801]
A central problem in the theory of multi-agent reinforcement learning (MARL) is to understand what structural conditions and algorithmic principles lead to sample-efficient learning guarantees. We study this question in a general framework for interactive decision making with multiple agents. We show that characterizing the statistical complexity for multi-agent decision making is equivalent to characterizing the statistical complexity of single-agent decision making.
arXiv Detail & Related papers (2023-05-01T06:46:22Z)
Safe Multi-agent Learning via Trapping Regions [89.24858306636816]
We apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning. We propose a binary partitioning algorithm for verification that candidate sets form trapping regions in systems with known learning dynamics, and a sampling algorithm for scenarios where learning dynamics are not known.
arXiv Detail & Related papers (2023-02-27T14:47:52Z)
On the Convergence of Distributed Stochastic Bilevel Optimization Algorithms over a Network [55.56019538079826]
Bilevel optimization has been applied to a wide variety of machine learning models. Most existing algorithms restrict their single-machine setting so that they are incapable of handling distributed data. We develop novel decentralized bilevel optimization algorithms based on a gradient tracking communication mechanism and two different gradients.
arXiv Detail & Related papers (2022-06-30T05:29:52Z)
Robust Reinforcement Learning via Genetic Curriculum [5.421464476555662]
Genetic curriculum is an algorithm that automatically identifies scenarios in which the agent currently fails and generates an associated curriculum. Our empirical studies show improvement in robustness over the existing state of the art algorithms, providing training curricula that result in agents being 2 - 8x times less likely to fail.
arXiv Detail & Related papers (2022-02-17T01:14:20Z)
Decentralized Multi-Agent Reinforcement Learning: An Off-Policy Method [6.261762915564555]
We discuss the problem of decentralized multi-agent reinforcement learning (MARL) in this work. In our setting, the global state, action, and reward are assumed to be fully observable, while the local policy is protected as privacy by each agent, and thus cannot be shared with others. The policy evaluation and policy improvement algorithms are designed for discrete and continuous state-action-space Markov Decision Process (MDP) respectively.
arXiv Detail & Related papers (2021-10-31T09:08:46Z)
SA-MATD3:Self-attention-based multi-agent continuous control method in cooperative environments [12.959163198988536]
Existing algorithms suffer from the problem of uneven learning degree with the increase of the number of agents. A new structure for a multi-agent actor critic is proposed, and the self-attention mechanism is applied in the critic network. The proposed algorithm makes full use of the samples in the replay memory buffer to learn the behavior of a class of agents.
arXiv Detail & Related papers (2021-07-01T08:15:05Z)
Scalable, Decentralized Multi-Agent Reinforcement Learning Methods Inspired by Stigmergy and Ant Colonies [0.0]
We investigate a novel approach to decentralized multi-agent learning and planning. In particular, this method is inspired by the cohesion, coordination, and behavior of ant colonies. The approach combines single-agent RL and an ant-colony-inspired decentralized, stigmergic algorithm for multi-agent path planning and environment modification.
arXiv Detail & Related papers (2021-05-08T01:04:51Z)
F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications. We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting. Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.