Bandit approach to conflict-free multi-agent Q-learning in view of
photonic implementation
- URL: http://arxiv.org/abs/2212.09926v1
- Date: Tue, 20 Dec 2022 00:27:29 GMT
- Title: Bandit approach to conflict-free multi-agent Q-learning in view of
photonic implementation
- Authors: Hiroaki Shinkawa, Nicolas Chauvet, Andr\'e R\"ohm, Takatomo Mihana,
Ryoichi Horisaki, Guillaume Bachelier, and Makoto Naruse
- Abstract summary: Previous studies have used quantum interference of photons to solve the competitive multi-armed bandit problem.
This study extends the conventional approach to a more general multi-agent reinforcement learning.
A successful photonic reinforcement learning scheme requires both a photonic system that contributes to the quality of learning and a suitable algorithm.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, extensive studies on photonic reinforcement learning to accelerate
the process of calculation by exploiting the physical nature of light have been
conducted. Previous studies utilized quantum interference of photons to achieve
collective decision-making without choice conflicts when solving the
competitive multi-armed bandit problem, a fundamental example of reinforcement
learning. However, the bandit problem deals with a static environment where the
agent's action does not influence the reward probabilities. This study aims to
extend the conventional approach to a more general multi-agent reinforcement
learning targeting the grid world problem. Unlike the conventional approach,
the proposed scheme deals with a dynamic environment where the reward changes
because of agents' actions. A successful photonic reinforcement learning scheme
requires both a photonic system that contributes to the quality of learning and
a suitable algorithm. This study proposes a novel learning algorithm,
discontinuous bandit Q-learning, in view of a potential photonic
implementation. Here, state-action pairs in the environment are regarded as
slot machines in the context of the bandit problem and an updated amount of
Q-value is regarded as the reward of the bandit problem. We perform numerical
simulations to validate the effectiveness of the bandit algorithm. In addition,
we propose a multi-agent architecture in which agents are indirectly connected
through quantum interference of light and quantum principles ensure the
conflict-free property of state-action pair selections among agents. We
demonstrate that multi-agent reinforcement learning can be accelerated owing to
conflict avoidance among multiple agents.
Related papers
- On the Complexity of Multi-Agent Decision Making: From Learning in Games
to Partial Monitoring [105.13668993076801]
A central problem in the theory of multi-agent reinforcement learning (MARL) is to understand what structural conditions and algorithmic principles lead to sample-efficient learning guarantees.
We study this question in a general framework for interactive decision making with multiple agents.
We show that characterizing the statistical complexity for multi-agent decision making is equivalent to characterizing the statistical complexity of single-agent decision making.
arXiv Detail & Related papers (2023-05-01T06:46:22Z) - Safe Multi-agent Learning via Trapping Regions [89.24858306636816]
We apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning.
We propose a binary partitioning algorithm for verification that candidate sets form trapping regions in systems with known learning dynamics, and a sampling algorithm for scenarios where learning dynamics are not known.
arXiv Detail & Related papers (2023-02-27T14:47:52Z) - Scalable Task-Driven Robotic Swarm Control via Collision Avoidance and
Learning Mean-Field Control [23.494528616672024]
We use state-of-the-art mean-field control techniques to convert many-agent swarm control into classical single-agent control of distributions.
Here, we combine collision avoidance and learning of mean-field control into a unified framework for tractably designing intelligent robotic swarm behavior.
arXiv Detail & Related papers (2022-09-15T16:15:04Z) - Quantum bandit with amplitude amplification exploration in an
adversarial environment [9.563657204041682]
We propose a quantum-inspired bandit learning approach for the learning-and-adapting-based offloading problem.
A new action update strategy and novel probabilistic action selection are adopted, provoked by the amplitude amplification and collapse in quantum theory.
The proposed algorithm is generalized, via the devised mapping, for better learning weight adjustments on favourable/unfavourable actions.
arXiv Detail & Related papers (2022-08-15T12:40:34Z) - Parallel bandit architecture based on laser chaos for reinforcement
learning [0.0]
photonics is an active field of study aiming to exploit the unique properties of photons.
In this study, we organize a new architecture for multi-state reinforcement learning as a parallel array of bandit problems.
We find that the variety of states that the system undergoes during the learning phase exhibits completely different properties between PBRL and Q-learning.
arXiv Detail & Related papers (2022-05-19T13:12:21Z) - ROMAX: Certifiably Robust Deep Multiagent Reinforcement Learning via
Convex Relaxation [32.091346776897744]
Cyber-physical attacks can challenge the robustness of multiagent reinforcement learning.
We propose a minimax MARL approach to infer the worst-case policy update of other agents.
arXiv Detail & Related papers (2021-09-14T16:18:35Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - SA-MATD3:Self-attention-based multi-agent continuous control method in
cooperative environments [12.959163198988536]
Existing algorithms suffer from the problem of uneven learning degree with the increase of the number of agents.
A new structure for a multi-agent actor critic is proposed, and the self-attention mechanism is applied in the critic network.
The proposed algorithm makes full use of the samples in the replay memory buffer to learn the behavior of a class of agents.
arXiv Detail & Related papers (2021-07-01T08:15:05Z) - Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning [92.05556163518999]
MARL exacerbates matters by imposing various constraints on communication and observability.
For value-based methods, it poses challenges in accurately representing the optimal value function.
For policy gradient methods, it makes training the critic difficult and exacerbates the problem of the lagging critic.
We show that from a learning theory perspective, both problems can be addressed by accurately representing the associated action-value function.
arXiv Detail & Related papers (2021-05-31T23:08:05Z) - A Hamiltonian Monte Carlo Method for Probabilistic Adversarial Attack
and Learning [122.49765136434353]
We present an effective method, called Hamiltonian Monte Carlo with Accumulated Momentum (HMCAM), aiming to generate a sequence of adversarial examples.
We also propose a new generative method called Contrastive Adversarial Training (CAT), which approaches equilibrium distribution of adversarial examples.
Both quantitative and qualitative analysis on several natural image datasets and practical systems have confirmed the superiority of the proposed algorithm.
arXiv Detail & Related papers (2020-10-15T16:07:26Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.