Robust Reinforcement Learning Under Minimax Regret for Green Security
- URL: http://arxiv.org/abs/2106.08413v1
- Date: Tue, 15 Jun 2021 20:11:12 GMT
- Title: Robust Reinforcement Learning Under Minimax Regret for Green Security
- Authors: Lily Xu, Andrew Perrault, Fei Fang, Haipeng Chen, Milind Tambe
- Abstract summary: Green security domains feature defenders who plan patrols in the face of uncertainty about the adversarial behavior of poachers, illegal loggers, and illegal fishers.
We focus on robust sequential patrol planning for green security following the minimax regret criterion, which has not been considered in the literature.
We formulate the problem as a game between the defender and nature who controls the parameter values of the adversarial behavior and design an algorithm MIRROR to find a robust policy.
- Score: 50.03819244940543
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Green security domains feature defenders who plan patrols in the face of
uncertainty about the adversarial behavior of poachers, illegal loggers, and
illegal fishers. Importantly, the deterrence effect of patrols on adversaries'
future behavior makes patrol planning a sequential decision-making problem.
Therefore, we focus on robust sequential patrol planning for green security
following the minimax regret criterion, which has not been considered in the
literature. We formulate the problem as a game between the defender and nature
who controls the parameter values of the adversarial behavior and design an
algorithm MIRROR to find a robust policy. MIRROR uses two reinforcement
learning-based oracles and solves a restricted game considering limited
defender strategies and parameter values. We evaluate MIRROR on real-world
poaching data.
Related papers
- Patrol Security Game: Defending Against Adversary with Freedom in Attack Timing, Location, and Duration [4.765278970103286]
Patrol Security Game (PSG) is a robotic patrolling problem modeled as an extensive-form deterministic Stackelberg problem.
Our objective is to devise a synthetic schedule that minimizes the attacker's time horizon.
arXiv Detail & Related papers (2024-10-21T02:53:18Z) - Preserving the Privacy of Reward Functions in MDPs through Deception [13.664014596337037]
Preserving the privacy of preferences (or rewards) of a sequential decision-making agent when decisions are observable is crucial in many physical and cybersecurity domains.
This paper addresses privacy preservation in planning over a sequence of actions in MDPs, where the reward function represents the preference structure to be protected.
arXiv Detail & Related papers (2024-07-13T09:03:22Z) - Refining Minimax Regret for Unsupervised Environment Design [15.281908507614512]
We introduce level-perfect MMR, a refinement of the minimax regret objective.
We show that BLP policies act consistently with a Perfect Bayesian policy over all levels.
We also introduce an algorithm, ReMiDi, that results in a BLP policy at convergence.
arXiv Detail & Related papers (2024-02-19T16:51:29Z) - On the Difficulty of Defending Contrastive Learning against Backdoor
Attacks [58.824074124014224]
We show how contrastive backdoor attacks operate through distinctive mechanisms.
Our findings highlight the need for defenses tailored to the specificities of contrastive backdoor attacks.
arXiv Detail & Related papers (2023-12-14T15:54:52Z) - Learning Vision-based Pursuit-Evasion Robot Policies [54.52536214251999]
We develop a fully-observable robot policy that generates supervision for a partially-observable one.
We deploy our policy on a physical quadruped robot with an RGB-D camera on pursuit-evasion interactions in the wild.
arXiv Detail & Related papers (2023-08-30T17:59:05Z) - Game-theoretic Objective Space Planning [4.989480853499916]
Understanding intent of other agents is crucial to deploying autonomous systems in adversarial multi-agent environments.
Current approaches either oversimplify the discretization of the action space of agents or fail to recognize the long-term effect of actions and become myopic.
We propose a novel dimension reduction method that encapsulates diverse agent behaviors while conserving the continuity of agent actions.
arXiv Detail & Related papers (2022-09-16T07:35:20Z) - Disturbing Reinforcement Learning Agents with Corrupted Rewards [62.997667081978825]
We analyze the effects of different attack strategies based on reward perturbations on reinforcement learning algorithms.
We show that smoothly crafting adversarial rewards are able to mislead the learner, and that using low exploration probability values, the policy learned is more robust to corrupt rewards.
arXiv Detail & Related papers (2021-02-12T15:53:48Z) - Dual-Mandate Patrols: Multi-Armed Bandits for Green Security [67.29846393678808]
Conservation efforts in green security domains to protect wildlife and forests are constrained by the limited availability of defenders.
We formulate the problem as a multi-armed bandit, where each action represents a patrol strategy.
We show that our algorithm, LIZARD, improves performance on real-world poaching data from Cambodia.
arXiv Detail & Related papers (2020-09-14T16:40:44Z) - Offline Contextual Bandits with Overparameterized Models [52.788628474552276]
We ask whether the same phenomenon occurs for offline contextual bandits.
We show that this discrepancy is due to the emphaction-stability of their objectives.
In experiments with large neural networks, this gap between action-stable value-based objectives and unstable policy-based objectives leads to significant performance differences.
arXiv Detail & Related papers (2020-06-27T13:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.