Strategizing against Q-learners: A Control-theoretical Approach
- URL: http://arxiv.org/abs/2403.08906v3
- Date: Tue, 16 Jul 2024 07:13:25 GMT
- Title: Strategizing against Q-learners: A Control-theoretical Approach
- Authors: Yuksel Arslantas, Ege Yuceel, Muhammed O. Sayin,
- Abstract summary: We quantify how much strategically sophisticated agents can exploit naive Q-learners if they know the opponents' Q-learning algorithm.
We present a quantization-based approximation scheme to tackle the continuum state space.
- Score: 1.3927943269211591
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we explore the susceptibility of the independent Q-learning algorithms (a classical and widely used multi-agent reinforcement learning method) to strategic manipulation of sophisticated opponents in normal-form games played repeatedly. We quantify how much strategically sophisticated agents can exploit naive Q-learners if they know the opponents' Q-learning algorithm. To this end, we formulate the strategic actors' interactions as a stochastic game (whose state encompasses Q-function estimates of the Q-learners) as if the Q-learning algorithms are the underlying dynamical system. We also present a quantization-based approximation scheme to tackle the continuum state space and analyze its performance for two competing strategic actors and a single strategic actor both analytically and numerically.
Related papers
- Finite-Time Error Analysis of Soft Q-Learning: Switching System Approach [4.36117236405564]
Soft Q-learning is a variation of Q-learning designed to solve entropy regularized Markov decision problems.
This paper aims to offer a novel and unified finite-time, control-theoretic analysis of soft Q-learning algorithms.
arXiv Detail & Related papers (2024-03-11T01:36:37Z) - Multi-Timescale Ensemble Q-learning for Markov Decision Process Policy
Optimization [21.30645601474163]
Original Q-learning suffers from performance and complexity challenges across very large networks.
New model-free ensemble reinforcement learning algorithm which adapts the classical Q-learning is proposed to handle these challenges.
Numerical results show that the proposed algorithm can achieve up to 55% less average policy error with up to 50% less runtime complexity.
arXiv Detail & Related papers (2024-02-08T08:08:23Z) - An Empirical Investigation of Value-Based Multi-objective Reinforcement
Learning for Stochastic Environments [1.26404863283601]
This paper examines the factors influencing the frequency with which value-based MORL Q-learning algorithms learn the SER-optimal policy.
We highlight the critical impact of the noisy Q-value estimates issue on the stability and convergence of these algorithms.
arXiv Detail & Related papers (2024-01-06T08:43:08Z) - Quantum Imitation Learning [74.15588381240795]
We propose quantum imitation learning (QIL) with a hope to utilize quantum advantage to speed up IL.
We develop two QIL algorithms, quantum behavioural cloning (Q-BC) and quantum generative adversarial imitation learning (Q-GAIL)
Experiment results demonstrate that both Q-BC and Q-GAIL can achieve comparable performance compared to classical counterparts.
arXiv Detail & Related papers (2023-04-04T12:47:35Z) - Safe Multi-agent Learning via Trapping Regions [89.24858306636816]
We apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning.
We propose a binary partitioning algorithm for verification that candidate sets form trapping regions in systems with known learning dynamics, and a sampling algorithm for scenarios where learning dynamics are not known.
arXiv Detail & Related papers (2023-02-27T14:47:52Z) - Problem-Dependent Power of Quantum Neural Networks on Multi-Class
Classification [83.20479832949069]
Quantum neural networks (QNNs) have become an important tool for understanding the physical world, but their advantages and limitations are not fully understood.
Here we investigate the problem-dependent power of QCs on multi-class classification tasks.
Our work sheds light on the problem-dependent power of QNNs and offers a practical tool for evaluating their potential merit.
arXiv Detail & Related papers (2022-12-29T10:46:40Z) - IQ-Learn: Inverse soft-Q Learning for Imitation [95.06031307730245]
imitation learning from a small amount of expert data can be challenging in high-dimensional environments with complex dynamics.
Behavioral cloning is a simple method that is widely used due to its simplicity of implementation and stable convergence.
We introduce a method for dynamics-aware IL which avoids adversarial training by learning a single Q-function.
arXiv Detail & Related papers (2021-06-23T03:43:10Z) - On Information Asymmetry in Competitive Multi-Agent Reinforcement
Learning: Convergence and Optimality [78.76529463321374]
We study the system of interacting non-cooperative two Q-learning agents.
We show that this information asymmetry can lead to a stable outcome of population learning.
arXiv Detail & Related papers (2020-10-21T11:19:53Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - Towards Understanding Cooperative Multi-Agent Q-Learning with Value
Factorization [28.89692989420673]
We formalize a multi-agent fitted Q-iteration framework for analyzing factorized multi-agent Q-learning.
Through further analysis, we find that on-policy training or richer joint value function classes can improve its local or global convergence properties.
arXiv Detail & Related papers (2020-05-31T19:14:03Z) - Periodic Q-Learning [24.099046883918046]
We study the so-called periodic Q-learning algorithm (PQ-learning for short)
PQ-learning maintains two separate Q-value estimates - the online estimate and target estimate.
In contrast to the standard Q-learning, PQ-learning enjoys a simple finite time analysis and achieves better sample for finding an epsilon-optimal policy.
arXiv Detail & Related papers (2020-02-23T00:33:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.