Visibility Optimization for Surveillance-Evasion Games
- URL: http://arxiv.org/abs/2010.09001v2
- Date: Sat, 26 Mar 2022 20:05:32 GMT
- Title: Visibility Optimization for Surveillance-Evasion Games
- Authors: Louis Ly and Yen-Hsi Richard Tsai
- Abstract summary: We consider surveillance-evasion differential games, where a pursuer must try to constantly maintain visibility of a moving evader.
We use an upwind scheme to compute the feedback value function, corresponding to the end-game time of the differential game.
We show that Monte Carlo tree search and self-play reinforcement learning can train a deep neural network to generate reasonable strategies for on-line game play.
- Score: 4.454557728745761
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider surveillance-evasion differential games, where a pursuer must try
to constantly maintain visibility of a moving evader. The pursuer loses as soon
as the evader becomes occluded. Optimal controls for game can be formulated as
a Hamilton-Jacobi-Isaac equation. We use an upwind scheme to compute the
feedback value function, corresponding to the end-game time of the differential
game. Although the value function enables optimal controls, it is prohibitively
expensive to compute, even for a single pursuer and single evader on a small
grid. We consider a discrete variant of the surveillance-game. We propose two
locally optimal strategies based on the static value function for the
surveillance-evasion game with multiple pursuers and evaders. We show that
Monte Carlo tree search and self-play reinforcement learning can train a deep
neural network to generate reasonable strategies for on-line game play. Given
enough computational resources and offline training time, the proposed model
can continue to improve its policies and efficiently scale to higher
resolutions.
Related papers
- Patrol Security Game: Defending Against Adversary with Freedom in Attack Timing, Location, and Duration [4.765278970103286]
Patrol Security Game (PSG) is a robotic patrolling problem modeled as an extensive-form deterministic Stackelberg problem.
Our objective is to devise a synthetic schedule that minimizes the attacker's time horizon.
arXiv Detail & Related papers (2024-10-21T02:53:18Z) - Adversarial Knapsack and Secondary Effects of Common Information for Cyber Operations [0.9378911615939924]
We formalize a dynamical network control game for Capture the Flag (CTF) competitions and detail the static game for each time step.
We define the Adversarial Knapsack optimization problems as a system of interacting Weighted Knapsack problems.
Common awareness of the scenario, rewards, and costs will set the stage for a non-cooperative game.
arXiv Detail & Related papers (2024-03-16T03:41:12Z) - Scaling Laws for Imitation Learning in Single-Agent Games [29.941613597833133]
We investigate whether carefully scaling up model and data size can bring similar improvements in the imitation learning setting for single-agent games.
We first demonstrate our findings on a variety of Atari games, and thereafter focus on the extremely challenging game of NetHack.
We find that IL loss and mean return scale smoothly with the compute budget and are strongly correlated, resulting in power laws for training compute-optimal IL agents.
arXiv Detail & Related papers (2023-07-18T16:43:03Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - Provably Efficient Fictitious Play Policy Optimization for Zero-Sum
Markov Games with Structured Transitions [145.54544979467872]
We propose and analyze new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions.
We prove tight $widetildemathcalO(sqrtK)$ regret bounds after $K$ episodes in a two-agent competitive game scenario.
Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization.
arXiv Detail & Related papers (2022-07-25T18:29:16Z) - Towards convergence to Nash equilibria in two-team zero-sum games [17.4461045395989]
Two-team zero-sum games are defined as multi-player games where players are split into two competing sets of agents.
We focus on the solution concept of Nash equilibria (NE)
We show that computing NE for this class of games is $textithard$ for the complexity class $mathrm$.
arXiv Detail & Related papers (2021-11-07T21:15:35Z) - Adversarial Online Learning with Variable Plays in the Pursuit-Evasion
Game: Theoretical Foundations and Application in Connected and Automated
Vehicle Cybersecurity [5.9774834479750805]
We extend the adversarial/non-stochastic multi-play multi-armed bandit (MPMAB) to the case where the number of arms to play is variable.
The work is motivated by the fact that the resources allocated to scan different critical locations in an interconnected transportation system change dynamically over time and depending on the environment.
arXiv Detail & Related papers (2021-10-26T23:09:42Z) - Simplifying Deep Reinforcement Learning via Self-Supervision [51.2400839966489]
Self-Supervised Reinforcement Learning (SSRL) is a simple algorithm that optimize policies with purely supervised losses.
We show that SSRL is surprisingly competitive to contemporary algorithms with more stable performance and less running time.
arXiv Detail & Related papers (2021-06-10T06:29:59Z) - Efficient exploration of zero-sum stochastic games [83.28949556413717]
We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay.
During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well.
Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly.
arXiv Detail & Related papers (2020-02-24T20:30:38Z) - Disentangling Controllable Object through Video Prediction Improves
Visual Reinforcement Learning [82.25034245150582]
In many vision-based reinforcement learning problems, the agent controls a movable object in its visual field.
We propose an end-to-end learning framework to disentangle the controllable object from the observation signal.
The disentangled representation is shown to be useful for RL as additional observation channels to the agent.
arXiv Detail & Related papers (2020-02-21T05:43:34Z) - Learning Zero-Sum Simultaneous-Move Markov Games Using Function
Approximation and Correlated Equilibrium [116.56359444619441]
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games.
In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.
In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret.
arXiv Detail & Related papers (2020-02-17T17:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.