Safe Multi-agent Learning via Trapping Regions
- URL: http://arxiv.org/abs/2302.13844v2
- Date: Tue, 16 May 2023 21:34:14 GMT
- Title: Safe Multi-agent Learning via Trapping Regions
- Authors: Aleksander Czechowski, Frans A. Oliehoek
- Abstract summary: We apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning.
We propose a binary partitioning algorithm for verification that candidate sets form trapping regions in systems with known learning dynamics, and a sampling algorithm for scenarios where learning dynamics are not known.
- Score: 89.24858306636816
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the main challenges of multi-agent learning lies in establishing
convergence of the algorithms, as, in general, a collection of individual,
self-serving agents is not guaranteed to converge with their joint policy, when
learning concurrently. This is in stark contrast to most single-agent
environments, and sets a prohibitive barrier for deployment in practical
applications, as it induces uncertainty in long term behavior of the system. In
this work, we apply the concept of trapping regions, known from qualitative
theory of dynamical systems, to create safety sets in the joint strategy space
for decentralized learning. We propose a binary partitioning algorithm for
verification that candidate sets form trapping regions in systems with known
learning dynamics, and a heuristic sampling algorithm for scenarios where
learning dynamics are not known. We demonstrate the applications to a
regularized version of Dirac Generative Adversarial Network, a
four-intersection traffic control scenario run in a state of the art
open-source microscopic traffic simulator SUMO, and a mathematical model of
economic competition.
Related papers
- Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks [94.2860766709971]
We address the challenge of sampling and remote estimation for autoregressive Markovian processes in a wireless network with statistically-identical agents.
Our goal is to minimize time-average estimation error and/or age of information with decentralized scalable sampling and transmission policies.
arXiv Detail & Related papers (2024-04-04T06:24:11Z) - On the dynamics of multi agent nonlinear filtering and learning [2.206852421529135]
Multiagent systems aim to accomplish highly complex learning tasks through decentralised consensus seeking dynamics.
This article examines the behaviour of multiagent networked systems with nonlinear filtering/learning dynamics.
arXiv Detail & Related papers (2023-09-07T08:39:53Z) - Distributionally Robust Model-based Reinforcement Learning with Large
State Spaces [55.14361269378122]
Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment.
We study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets.
We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics.
arXiv Detail & Related papers (2023-09-05T13:42:11Z) - Strategy Synthesis in Markov Decision Processes Under Limited Sampling
Access [3.441021278275805]
In environments modeled by gray-box Markov decision processes (MDPs), the impact of the agents' actions are known in terms of successor states but not the synthesiss involved.
In this paper, we devise a strategy algorithm for gray-box MDPs via reinforcement learning that utilizes interval MDPs as internal model.
arXiv Detail & Related papers (2023-03-22T16:58:44Z) - Efficient Domain Coverage for Vehicles with Second-Order Dynamics via
Multi-Agent Reinforcement Learning [9.939081691797858]
We present a reinforcement learning (RL) approach for the multi-agent efficient domain coverage problem involving agents with second-order dynamics.
Our proposed network architecture includes the incorporation of LSTM and self-attention, which allows the trained policy to adapt to a variable number of agents.
arXiv Detail & Related papers (2022-11-11T01:59:12Z) - Guaranteed Conservation of Momentum for Learning Particle-based Fluid
Dynamics [96.9177297872723]
We present a novel method for guaranteeing linear momentum in learned physics simulations.
We enforce conservation of momentum with a hard constraint, which we realize via antisymmetrical continuous convolutional layers.
In combination, the proposed method allows us to increase the physical accuracy of the learned simulator substantially.
arXiv Detail & Related papers (2022-10-12T09:12:59Z) - Deep Reinforcement Learning for Distributed and Uncoordinated Cognitive
Radios Resource Allocation [1.218340575383456]
This paper presents a novel deep reinforcement learning-based resource allocation technique for the multi-agent environment presented by a cognitive radio network.
The presented algorithm converges in an arbitrarily long time to equilibrium policies in the non-stationary environment.
It is shown that the use of a standard single-agent deep reinforcement learning approach may not achieve convergence when used in an uncoordinated interacting multi-radio scenario.
arXiv Detail & Related papers (2022-05-27T12:43:30Z) - Dimension-Free Rates for Natural Policy Gradient in Multi-Agent
Reinforcement Learning [22.310861786709538]
We propose a scalable algorithm for cooperative multi-agent reinforcement learning.
We show that our algorithm converges to the globally optimal policy with a dimension-free statistical and computational complexity.
arXiv Detail & Related papers (2021-09-23T23:38:15Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Decentralized MCTS via Learned Teammate Models [89.24858306636816]
We present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search.
We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators.
arXiv Detail & Related papers (2020-03-19T13:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.