Learning to Switch Among Agents in a Team via 2-Layer Markov Decision
Processes
- URL: http://arxiv.org/abs/2002.04258v3
- Date: Fri, 30 Jun 2023 19:09:17 GMT
- Title: Learning to Switch Among Agents in a Team via 2-Layer Markov Decision
Processes
- Authors: Vahid Balazadeh and Abir De and Adish Singla and Manuel
Gomez-Rodriguez
- Abstract summary: We develop algorithms that, by learning to switch control between agents, allow existing reinforcement learning agents to operate under different automation levels.
The total regret of our algorithm with respect to the optimal switching policy is sublinear in the number of learning steps.
Simulation experiments in an obstacle avoidance task illustrate our theoretical findings and demonstrate that, by exploiting the specific structure of the problem, our proposed algorithm is superior to problem-agnostic algorithms.
- Score: 41.04897149364321
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning agents have been mostly developed and evaluated under
the assumption that they will operate in a fully autonomous manner -- they will
take all actions. In this work, our goal is to develop algorithms that, by
learning to switch control between agents, allow existing reinforcement
learning agents to operate under different automation levels. To this end, we
first formally define the problem of learning to switch control among agents in
a team via a 2-layer Markov decision process. Then, we develop an online
learning algorithm that uses upper confidence bounds on the agents' policies
and the environment's transition probabilities to find a sequence of switching
policies. The total regret of our algorithm with respect to the optimal
switching policy is sublinear in the number of learning steps and, whenever
multiple teams of agents operate in a similar environment, our algorithm
greatly benefits from maintaining shared confidence bounds for the
environments' transition probabilities and it enjoys a better regret bound than
problem-agnostic algorithms. Simulation experiments in an obstacle avoidance
task illustrate our theoretical findings and demonstrate that, by exploiting
the specific structure of the problem, our proposed algorithm is superior to
problem-agnostic algorithms.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Multi-Agent Bandit Learning through Heterogeneous Action Erasure Channels [21.860440468189044]
Multi-Armed Bandit (MAB) systems are witnessing an upswing in applications within multi-agent distributed environments.
In such settings, communication between agents executing actions and the primary learner making decisions can hinder the learning process.
We introduce novel algorithms that enable learners to interact concurrently with distributed agents across heterogeneous action erasure channels.
arXiv Detail & Related papers (2023-12-21T19:21:19Z) - On the Complexity of Multi-Agent Decision Making: From Learning in Games
to Partial Monitoring [105.13668993076801]
A central problem in the theory of multi-agent reinforcement learning (MARL) is to understand what structural conditions and algorithmic principles lead to sample-efficient learning guarantees.
We study this question in a general framework for interactive decision making with multiple agents.
We show that characterizing the statistical complexity for multi-agent decision making is equivalent to characterizing the statistical complexity of single-agent decision making.
arXiv Detail & Related papers (2023-05-01T06:46:22Z) - Safe Multi-agent Learning via Trapping Regions [89.24858306636816]
We apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning.
We propose a binary partitioning algorithm for verification that candidate sets form trapping regions in systems with known learning dynamics, and a sampling algorithm for scenarios where learning dynamics are not known.
arXiv Detail & Related papers (2023-02-27T14:47:52Z) - On the Convergence of Distributed Stochastic Bilevel Optimization
Algorithms over a Network [55.56019538079826]
Bilevel optimization has been applied to a wide variety of machine learning models.
Most existing algorithms restrict their single-machine setting so that they are incapable of handling distributed data.
We develop novel decentralized bilevel optimization algorithms based on a gradient tracking communication mechanism and two different gradients.
arXiv Detail & Related papers (2022-06-30T05:29:52Z) - Robust Reinforcement Learning via Genetic Curriculum [5.421464476555662]
Genetic curriculum is an algorithm that automatically identifies scenarios in which the agent currently fails and generates an associated curriculum.
Our empirical studies show improvement in robustness over the existing state of the art algorithms, providing training curricula that result in agents being 2 - 8x times less likely to fail.
arXiv Detail & Related papers (2022-02-17T01:14:20Z) - Decentralized Multi-Agent Reinforcement Learning: An Off-Policy Method [6.261762915564555]
We discuss the problem of decentralized multi-agent reinforcement learning (MARL) in this work.
In our setting, the global state, action, and reward are assumed to be fully observable, while the local policy is protected as privacy by each agent, and thus cannot be shared with others.
The policy evaluation and policy improvement algorithms are designed for discrete and continuous state-action-space Markov Decision Process (MDP) respectively.
arXiv Detail & Related papers (2021-10-31T09:08:46Z) - SA-MATD3:Self-attention-based multi-agent continuous control method in
cooperative environments [12.959163198988536]
Existing algorithms suffer from the problem of uneven learning degree with the increase of the number of agents.
A new structure for a multi-agent actor critic is proposed, and the self-attention mechanism is applied in the critic network.
The proposed algorithm makes full use of the samples in the replay memory buffer to learn the behavior of a class of agents.
arXiv Detail & Related papers (2021-07-01T08:15:05Z) - Scalable, Decentralized Multi-Agent Reinforcement Learning Methods
Inspired by Stigmergy and Ant Colonies [0.0]
We investigate a novel approach to decentralized multi-agent learning and planning.
In particular, this method is inspired by the cohesion, coordination, and behavior of ant colonies.
The approach combines single-agent RL and an ant-colony-inspired decentralized, stigmergic algorithm for multi-agent path planning and environment modification.
arXiv Detail & Related papers (2021-05-08T01:04:51Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.