Parallel bandit architecture based on laser chaos for reinforcement
learning
- URL: http://arxiv.org/abs/2205.09543v1
- Date: Thu, 19 May 2022 13:12:21 GMT
- Title: Parallel bandit architecture based on laser chaos for reinforcement
learning
- Authors: Takashi Urushibara, Nicolas Chauvet, Satoshi Kochi, Satoshi Sunada,
Kazutaka Kanno, Atsushi Uchida, Ryoichi Horisaki, Makoto Naruse
- Abstract summary: photonics is an active field of study aiming to exploit the unique properties of photons.
In this study, we organize a new architecture for multi-state reinforcement learning as a parallel array of bandit problems.
We find that the variety of states that the system undergoes during the learning phase exhibits completely different properties between PBRL and Q-learning.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accelerating artificial intelligence by photonics is an active field of study
aiming to exploit the unique properties of photons. Reinforcement learning is
an important branch of machine learning, and photonic decision-making
principles have been demonstrated with respect to the multi-armed bandit
problems. However, reinforcement learning could involve a massive number of
states, unlike previously demonstrated bandit problems where the number of
states is only one. Q-learning is a well-known approach in reinforcement
learning that can deal with many states. The architecture of Q-learning,
however, does not fit well photonic implementations due to its separation of
update rule and the action selection. In this study, we organize a new
architecture for multi-state reinforcement learning as a parallel array of
bandit problems in order to benefit from photonic decision-makers, which we
call parallel bandit architecture for reinforcement learning or PBRL in short.
Taking a cart-pole balancing problem as an instance, we demonstrate that PBRL
adapts to the environment in fewer time steps than Q-learning. Furthermore,
PBRL yields faster adaptation when operated with a chaotic laser time series
than the case with uniformly distributed pseudorandom numbers where the
autocorrelation inherent in the laser chaos provides a positive effect. We also
find that the variety of states that the system undergoes during the learning
phase exhibits completely different properties between PBRL and Q-learning. The
insights obtained through the present study are also beneficial for existing
computing platforms, not just photonic realizations, in accelerating
performances by the PBRL algorithms and correlated random sequences.
Related papers
- Mastering the Digital Art of War: Developing Intelligent Combat Simulation Agents for Wargaming Using Hierarchical Reinforcement Learning [0.0]
dissertation proposes a comprehensive approach, including targeted observation abstractions, multi-model integration, a hybrid AI framework, and an overarching hierarchical reinforcement learning framework.
Our localized observation abstraction using piecewise linear spatial decay simplifies the RL problem, enhancing computational efficiency and demonstrating superior efficacy over traditional global observation methods.
Our hybrid AI framework synergizes RL with scripted agents, leveraging RL for high-level decisions and scripted agents for lower-level tasks, enhancing adaptability, reliability, and performance.
arXiv Detail & Related papers (2024-08-23T18:50:57Z) - Decentralized multi-agent reinforcement learning algorithm using a cluster-synchronized laser network [1.124958340749622]
We propose a photonic-based decision-making algorithm to address the competitive multi-armed bandit problem.
Our numerical simulations demonstrate that chaotic oscillations and cluster synchronization of optically coupled lasers, along with our proposed decentralized coupling adjustment, efficiently balance exploration and exploitation.
arXiv Detail & Related papers (2024-07-12T09:38:47Z) - Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach [2.3020018305241337]
This paper is the first to propose considering the RRL problems within the positional differential game theory.
Namely, we prove that under Isaacs's condition, the same Q-function can be utilized as an approximate solution of both minimax and maximin Bellman equations.
We present the Isaacs Deep Q-Network algorithms and demonstrate their superiority compared to other baseline RRL and Multi-Agent RL algorithms in various environments.
arXiv Detail & Related papers (2024-05-03T12:21:43Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Bandit approach to conflict-free multi-agent Q-learning in view of
photonic implementation [0.0]
Previous studies have used quantum interference of photons to solve the competitive multi-armed bandit problem.
This study extends the conventional approach to a more general multi-agent reinforcement learning.
A successful photonic reinforcement learning scheme requires both a photonic system that contributes to the quality of learning and a suitable algorithm.
arXiv Detail & Related papers (2022-12-20T00:27:29Z) - Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning [92.18524491615548]
Contrastive self-supervised learning has been successfully integrated into the practice of (deep) reinforcement learning (RL)
We study how RL can be empowered by contrastive learning in a class of Markov decision processes (MDPs) and Markov games (MGs) with low-rank transitions.
Under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs.
arXiv Detail & Related papers (2022-07-29T17:29:08Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - Towards Efficient Processing and Learning with Spikes: New Approaches
for Multi-Spike Learning [59.249322621035056]
We propose two new multi-spike learning rules which demonstrate better performance over other baselines on various tasks.
In the feature detection task, we re-examine the ability of unsupervised STDP with its limitations being presented.
Our proposed learning rules can reliably solve the task over a wide range of conditions without specific constraints being applied.
arXiv Detail & Related papers (2020-05-02T06:41:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.