Adversarial Online Learning with Variable Plays in the Pursuit-Evasion
Game: Theoretical Foundations and Application in Connected and Automated
Vehicle Cybersecurity
- URL: http://arxiv.org/abs/2110.14078v1
- Date: Tue, 26 Oct 2021 23:09:42 GMT
- Title: Adversarial Online Learning with Variable Plays in the Pursuit-Evasion
Game: Theoretical Foundations and Application in Connected and Automated
Vehicle Cybersecurity
- Authors: Yiyang Wang, Neda Masoud
- Abstract summary: We extend the adversarial/non-stochastic multi-play multi-armed bandit (MPMAB) to the case where the number of arms to play is variable.
The work is motivated by the fact that the resources allocated to scan different critical locations in an interconnected transportation system change dynamically over time and depending on the environment.
- Score: 5.9774834479750805
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We extend the adversarial/non-stochastic multi-play multi-armed bandit
(MPMAB) to the case where the number of arms to play is variable. The work is
motivated by the fact that the resources allocated to scan different critical
locations in an interconnected transportation system change dynamically over
time and depending on the environment. By modeling the malicious hacker and the
intrusion monitoring system as the attacker and the defender, respectively, we
formulate the problem for the two players as a sequential pursuit-evasion game.
We derive the condition under which a Nash equilibrium of the strategic game
exists. For the defender side, we provide an exponential-weighted based
algorithm with sublinear pseudo-regret. We further extend our model to
heterogeneous rewards for both players, and obtain lower and upper bounds on
the average reward for the attacker. We provide numerical experiments to
demonstrate the effectiveness of a variable-arm play.
Related papers
- Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities [69.34646544774161]
We formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures arrival of requests to each arm and the policy of allocating requests to players.
The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile.
We design an iterative distributed algorithm, which guarantees that players can arrive at a consensus on the optimal arm pulling profile in only M rounds.
arXiv Detail & Related papers (2024-08-20T13:57:00Z) - Multi-defender Security Games with Schedules [42.32444288821052]
Security games are often used to model strategic interactions in high-stakes security settings.
Many realistic scenarios feature multiple heterogeneous defenders with their own interests and priorities embedded in a more complex system.
We show that unlike prior work on multi-defender security games, the introduction of schedules can cause non-existence of equilibrium.
arXiv Detail & Related papers (2023-11-28T00:39:02Z) - Cooperation or Competition: Avoiding Player Domination for Multi-Target
Robustness via Adaptive Budgets [76.20705291443208]
We view adversarial attacks as a bargaining game in which different players negotiate to reach an agreement on a joint direction of parameter updating.
We design a novel framework that adjusts the budgets of different adversaries to avoid any player dominance.
Experiments on standard benchmarks show that employing the proposed framework to the existing approaches significantly advances multi-target robustness.
arXiv Detail & Related papers (2023-06-27T14:02:10Z) - Adversarial Machine Learning and Defense Game for NextG Signal
Classification with Deep Learning [1.1726528038065764]
NextG systems can employ deep neural networks (DNNs) for various tasks such as user equipment identification, physical layer authentication, and detection of incumbent users.
This paper presents a game-theoretic framework to study the interactions of attack and defense for deep learning-based NextG signal classification.
arXiv Detail & Related papers (2022-12-22T15:13:03Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - How Bad is Selfish Driving? Bounding the Inefficiency of Equilibria in
Urban Driving Games [64.71476526716668]
We study the (in)efficiency of any equilibrium players might agree to play.
We obtain guarantees that refine existing bounds on the Price of Anarchy.
Although the obtained guarantees concern open-loop trajectories, we observe efficient equilibria even when agents employ closed-loop policies.
arXiv Detail & Related papers (2022-10-24T09:32:40Z) - Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the
Age of AI-NIDS [70.60975663021952]
We study blackbox adversarial attacks on network classifiers.
We argue that attacker-defender fixed points are themselves general-sum games with complex phase transitions.
We show that a continual learning approach is required to study attacker-defender dynamics.
arXiv Detail & Related papers (2021-11-23T23:42:16Z) - Visibility Optimization for Surveillance-Evasion Games [4.454557728745761]
We consider surveillance-evasion differential games, where a pursuer must try to constantly maintain visibility of a moving evader.
We use an upwind scheme to compute the feedback value function, corresponding to the end-game time of the differential game.
We show that Monte Carlo tree search and self-play reinforcement learning can train a deep neural network to generate reasonable strategies for on-line game play.
arXiv Detail & Related papers (2020-10-18T15:02:41Z) - Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents.
Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z) - Selfish Robustness and Equilibria in Multi-Player Bandits [25.67398941667429]
In a game, several players simultaneously pull arms and encounter a collision - with 0 reward - if some of them pull the same arm at the same time.
While the cooperative case where players maximize the collective reward has been mostly considered, to malicious players is a crucial and challenging concern.
We shall consider instead the more natural class of selfish players whose incentives are to maximize their individual rewards, potentially at the expense of the social welfare.
arXiv Detail & Related papers (2020-02-04T09:50:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.